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Abstract. 

Thinning “Skeletonization” is a very crucial stage in 
the Arabic Character Recognition (ACR) system. It 
simplifies the text shape and reduces the amount of 
data that needs to be handled and it is usually used as 
a pre-processing stage for recognition and storage 
systems. The skeleton of Arabic text can be used for: 
baseline detection, character segmentation, and 
features extraction, and ultimately supporting the 
classification. In this paper, five of the state of the art 
thinning algorithms are selected and implemented. 
The five algorithms are: SPTA, Zhang- Suen parallel 
thinning algorithm, Voronoi- based thinning 
algorithm, thinning and skeletonization based 
morphological operation algorithms. The five selected 
algorithms are applied on the IFN/ENIT dataset. The 
results obtained by the five methods are discussed and 
analyzed against the IFN/ENIT dataset based on 
preserving shape and the text connectivity, preventing 
spurious tails, maintaining one pixel width skeleton 
and avoiding the necking problem as well as running 
time efficiently. In addition to that some performance 
measurement for checking text connectivity, spurious 
tails and calculating the stroke thickness are proposed 
and carried out. 

Keywords: Thinning, Skeleton, Arabic Character 
Recognition, SPTA, Zhang -Suen, Voronoi-based, 
morphological, Text connectivity. 

1. Introduction 

In Image processing, the extraction of the skeleton of 
digital images is widely used as a pre-processing 
stage for recognition and storage systems. Skeleton of 
characters is the single pixel wide set of it. The main 
aim of the thinning process is to reduce the outline of 
a character in which the several-pixel character 
thickness is reduced in the process of thinning into a 
single pixel shape that forms the basis of the character. 
It eliminates a lot of unwanted information, and hence 
reduces the memory required for storing the structural 



information. The skeleton products usually construct 
from a set of lines, curves, and loops [1][12] [3]. 
Generally, thinning methods can be broadly classified 
into two, iterative and non-iterative [10]. In an 
iterative method the pixels which can be removed, 
pertaining to a condition that they satisfy specific 
conditions imposed by the algorithm, are flagged 
during each iteration. The pixels that are flagged are a 
function of the previous iteration resultant image and 
current iteration processed image. The process is 
continued until no further pixels in the image are 
flagged. The order of traversal of pixels in the image 
may be sequential [14] or parallel [16]. The basic 
feature exploited during the processing is the pixel 
neighbourhood (generally the 8 pixels surrounding 
the main pixel in a 3 x 3 window). The process of 
flagging and removing of pixels in repetitive iteration 
following a specific traversing order are sequential 
methods while in a parallel method flagging and 
removing a pixel is considered a function of the 
previous iteration output image. A non-iterative 
method on the other hand, is one where the flagging 
of pixels is carried on if they satisfy special properties 
or parts of the polygon regions where a region is 
divided into set of regular or irregular polygons [12]. 
Non-iterative methods have an advantage that they 
produce accurate results but are at the cost of 
computation time [1]. 

Thinning is used in many recognition applications 
such as, text, chromosome, and finger print analysis 
[3] [12] [10]. It is very important in Arabic Character 
Recognition (ACR) system, it simplifies the Arabic 
texts shapes for segmentation process, feature 
extraction, and classification, and this is resulted in 
reducing the amount of data that need to be handled 
[2]. Many ACR systems have been developed based 
on the text skeleton. The skeleton was extensively 
used in supporting each of feature extraction and 
classification stages [7]. Beside that its used as the 
basis for many methods designed for Arabic text 
segmentation, more details about the segmentation 
methods based on text skeleton can be found in [4]. 
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The skeleton was also used in the estimation of the 
Arabic handwriting word baseline [1]. 

Over the years, few thinning algorithms specially 
developed for the Arabic text. Several thinning 
algorithms, which are designed for different purposes, 
have been used to extract the skeleton of Arabic text. 
Mostafa [11] used the non-iterative thinning 
algorithm, which was developed by Kegl and 
Krzyzak [ 6 ], to segment the Arabic cursive printed 
words into characters or into small primitives. In 
addition, Benouareth et al [5] used the sequential 
thinning algorithm created by Pavlidis [15] for Arabic 
handwritten word recognition using Hidden Markov 
Models with explicit state duration. Many other 
thinning algorithms have been used to extract the 
skeleton of Arabic text, such as [1] [ 8 ] [16]. 

In the literature, there were many thinning algorithms 
have been proposed for different purposes, and a 
comprehensive survey of these methods is contained 
in [10]. Generally, in the thinning algorithm research, 
there exist two main problems where the actual study 
focuses, namely, the algorithm execution time and the 
resulting thinned image shape. The thinning 
algorithms produced may also generate good 
skeletons for some shapes but can produce poor 
skeletons for others. It is very difficult to develop a 
generalized thinning algorithm which can produce 
satisfactory results for all varieties of pattern shapes 
[ 1 ]. 

An effective thinning algorithm for Arabic text 
should ideally meet the following requirements: 
preserving shape and text connectivity, preventing 
spurious tails, maintaining one pixel width skeleton, 
avoiding necking problem, and running time 
efficiently. Failing to do so, results in the words to be 
disconnected into sub words. Hence, it completely 
damages and changes the text features, while 
affecting the shape of text [1]. Safe Point Thinning 
Algorithm (SPTA), Zhang-Suen thinning algorithm, 
Voronoi-based thinning algorithm, thinning and 
skeletonization based morphological operation 
methods (TBMO, SBMO) are selected to show the 
impact of the thinning on the ACR. 

This paper is organized as follows. Section (2) 
provides and describes the five selected thinning 
algorithms. Section (3) presents the experimental 
results. Section (4) discuses the results based on the 
challenges of thinning Arabic text. Section (5) 
summaries the obtained results. Finally, section ( 6 ) 
presents the conclusion and the future direction. 



2. The Thinning Algorithms 

A. Safe Point Thinning Algorithm ( SPTA ) 

SPTA is a sequential iterative algorithm proposed by 
Naccache and Shinghal [14]. In which edge points are 
deleted while endpoints, and connectednesses are 
preserved and excessive erosion is avoided. Edge 
points are the pixels on the edges of a shape and end 



points are pixels at the end of a stroke. The input 
image is considered to have shape pixels in black and 
background pixels as white pixels. For a pixel with 
coordinate (x, y), the pixels (x+ 1 , y), (x- 1 , y), (x, y- 1 ) 
and (x, y+1) constitute the 4-neighbor set, and the 8 
pixels in the 3 x 3 window around the pixel (x, y) are 
the 8 neighbor, (Figure 1) [1]. 
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Figure 1 . A point p and its 8-neighbors (tO to t7). The points tO, 
t2, t4 and t6 are referred to as the 4-neighbors of p 

An edge point is a black pixel with at least one white 
4-neighbour while an end point is a black pixel with 
at least one black 8 -neighbor and break point is a 
pixel whose removal will break the connectedness of 
the pattern. The edge points pixels are flagged if they 
are not end points or do not break connectedness or 
do not cause excessive erosion. At the end of each 
iteration all the flagged pixels are removed. If no 
flagged pixels are present, it is the end of the process. 
The edge point is called as left-edge point if t4 is 
white, right-edge point if tO is white, top-edge point if 
t 2 is white and bottom-edge point if t 6 is white [ 1 ]. 

A Boolean equation is framed for identifying an edge 
point without deleting end point and breaking 
excessive erosion and also without breaking 
connectedness, taking into consideration the 8 - 
neighbors of a pixel. 

For identifying a left-edge point the Boolean 
expression is 

S4 = to. { t\ + t2 + to + ti). { t 2 + -£3). {to + -£5) ( 1 ) 

Similarly, right edge point, top edge point, bottom 
edge point are identified using the Boolean 
expressions for SO, S2 and S 6 respectively. 

So = t4. ( t 2 + h + to + to), {to + ~ti). {t 2 + ~£i) (2) 

S 2 = to. {ti + to + U + fs). {to + ~ 0 ). {t4 + M 3 ) (3) 

S 6 = t 2 . {to + t4 + to + t\). {t4 + M 5 ). {to + M 7 ) (4) 

SPTA is a two stage process. During the first stage 

the edge pixels are identified using the Boolean 
expressions SO, S2, S4, S 6 and are flagged. During 
the second stage the flagged pixels are removed 
resulting in the thinned image [ 1 ]. 

Algorithm (1) illustrates the SPTA algorithm. 

Algorithm (1): SPTA 
Input: binary text image 

Output: thinned image 

Begin 

scan the image row by row 

for each pixel p (. x , y) apply the 8 - 
nighbour (see Figl) 
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let t 0 , t\, t 2 , h, U, h, U, tn = values of (x+1, 
y),(x+l, y+1), (x-l, y), ( jt-1, y-1), (x, y- 
1), (x+l,y-l), {x, y+1), and ( x-\,y+\)) 
respectively, 
examine (p ) 

if one of the following conditions 
is true (true means it is not 
end or break points) 

S 4 = to. (t\ + t 2 + to + tl). (t 2 ■+ tl). (t6 + ~ts) (1) 

So = U. {tl + t3 + t6 + ft). {t6 + ~tj). {tl + ~tl) (2) 

S 2 = U. {tl + to + U + te). {to “I tl). (t4 + ~t3) (3) 

S6 = tl. {tl + U + t0 + tl). (t4 + ~t5). (t0 + ~t7) (4) 

Remove the pixel 
end 



B. Zhang-Suen Parallel Thinning Algorithm 
Zhang and Suen [16] parallel thinning method is 
considered as the benchmark for most of the thinning 
methods developed. Generally it is considered to be 
fast enough as well as accurate in thinning different 
patterns. The speed is because it is a parallel 
algorithm meaning the new pixel value is a function 
of only previous iteration resultant image. The 
algorithm is a two tier one where in during the first 
tier pixels are flagged pertaining to a given set of 
conditions. If no pixels are flagged during the first tier 
the process stops. If nonzero numbers of pixels are 
flagged they are removed and the second tier iteration 
starts. The pixels are flagged if they satisfy the 
following conditions [1]: 

i. If the number of transitions from background 
to foreground which means from back to 
white in the pixel neighborhood is one. 

ii. If the number of black neighbors is between 
two and six in the 3 x 3 pixel neighborhood. 

iii. Either tO or t2 or t4 is white. 

iv. Either tO or t2 or t6 is white. 

Zhang-Suen parallel thinning algorithm is selected 
to thin the Arabic cursive text using the following 
algorithm (see algorithm 2). 



Algorithm (2): Zhang-Suen parallel thinning 
algorithm 

Input: binary text image 

Output: thinned image. 

Begin 

scan the image row by row 
for each pixel p (x, y) apply the 8-nighbour (Fig 1) 
let t 0 , h, t 2 , h, u, ts, h, ti = values of (x+1, 
y),(x+l, y+1), (jt-l,y), (jt-l,y-l), {x,y- 1), 
(x+l,y-l), {x, y+1), and ( x-l, y+1)) 
respectively. 

let A(P) be the number of zero patterns in the 
order set to ... tj 

let B(P) be the number of non-zero neighbors 

ofP 



start sub-iteration 1 : 
delete P if the following conditions are 

satisfied: 

a) 2 < B{P) < 6 

b) A (P) = 1 

c) to* t 2 * t 6 = 1 

d) t 0 * U *t 6 = 1 
end sub-iteration 1 

start sub-iteration 2 

delete P if the following conditions are also 

satisfied: 

a) and b) from above 
c') t 0 * U * t 6 = 1 
d') t 2 * U * t 6 = 1 
end sub -iteration 2 

End 

C. Voronoi- based thinning Algorithm 
Voronoi Diagrams (VDs) is defined as a rediscovered 
mathematical concept from the 19th century. It is one 
of the most fundamental and useful construction 
defined by irregular lattices. Voronoi vertex, edges, 
area neighbors are considered as properties and terms 
that associated with VDs [9]. 

Voronoi- based thinning Method is a novel non- 
iterative thinning approach proposed by Al-Shatnawi 
[9] for skeletonization the Arabic handwritten text. 
The Voronoi- based thinning method extracts the 
skeleton of handwritten text based on the selected 
sampling along the text contour. Then, the Point- VD 
is constructed from these selected sampling points. 
Only the VD vertices which are located within the 
text boundaries are kept and joined. Sampling interval 
(R) is significant for this performance, Using the 
sampling interval R=5 is the best as proven by Al- 
Shatnawi [9]. Algorithm (3) illustrates The Voronoi- 
based thinning Algorithm. Figure 2 presents the 
thinning process for the handwriting sentence ‘Thiraa 
bin Ziad’ (AO a? £'j^) using the Voronoi- based 
thinning Method. 



Algorithm (3): Voronoi- based thinning algorithm 
Input: binary text image [9] 

Output: thinned image 

Begin 

every pixel in the image is labelled with a number 
referring to the connected component it belongs to, 
pixels of the foreground are labelled with 1 
trace the inner and outer contours 
select samples along the contour using fixed sampling 
interval R=5. 

construct VDs using all samples as generators 
keep the VDs vertices which are located within the text 
boundaries and delete all others VD components, 
if two vertices have two or more determined VD cells 
they are adjacent vertices 
join each vertex with its adjacent vertices 

End 
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D. Thinning And Skeletonization Based 
Morphological Operation Methods 

The TBMO method is selected to thin the Arabic 
cursive text using the following algorithm (see 
algorithm 4) [1][9]. 



Algorithm (algorithm 4): TBMO algorithm 
Input: binary text image 
Output: thinned image 



Begin 

scan the image row by row 
for each pixel (p ( x , y)) apply the 8-nighbour 
(Figure 1) 

let t 0i h, t 2 , t 3 , U, ts, t6, ^ 7 = values of (x+1, 
>0,0+1, y+1), (x-l,y),(x-l,y-l), (x,y- 1), 
(x+1, y-1), (x, y+1), and ( x-l, y+1)) 
respectively. 

let A(P) be the number of non-zero neighbors 

ofP 



let B(P) be the number of 0 to 1 (or 1 to 0) 
transitions in the sequence (tO to tl) 
start sub-iteration 1 : 

mark any edge pixel P=1 not satisfying 
at least one of the following conditions 
(to be deleted in second step): 

A (P) = 0 (an isolated point) 

A (P) = 1 (tip of a line) 

A (P) = 7 (located in concavity) 

A (P) = 8 (not a boundary point) 

B (P) >= 2 (on a bridge connecting 
two or more edge pieces) 
end sub-iteration 1 
start sub-iteration 2: 

deleted all marked edge points 

end 



The SBMO method extracts the skeleton of the 
text image by calculating the distance transform of 
the image. The method is selected to thin the Arabic 
cursive text using the following algorithm (see 
algorithm 5). 



Algorithm (5): SBMO algorithm 
Input: binary text image 

Output: thinned image 

begin 

start iteration 1 ; 

let k = 0; k is an initial value, 
scan the image row by row 
examine each pixel 

if pixel (x, y) is inside the shape 
h(x, y)= 1 
else 

h(x, y) = 0 
end iteration 1 

start iteration 2; 




(f) 

Figure 2. Voronoi- based thinning Method (a)Arabic handwriting 
sentence ‘Thiraa bin Ziad’ (Au & t'A) (b) Edge detection and 
contour tracing (c) Sampling process (d) VD constructed (e) 
Voronoi vertices inside the text boundaries (f) Skeleton. 



repeat the following until no more 
change can be made 
K = k + 1 

for all 7k- 1 (x, y) = k , do 7k ( x , y) = min 
{ 7k- 1 (i, j)}+ 1; (/, j ) is the four closet 
neighbour of (x, y) 
end iteration 2; 
start iteration 3: 

the skeleton is the set of points given 
below 

{(-*» y) / h (x, y) = max { h ft j) } } 
end iteration 3; 

end 
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3. Experimental Results 

The SPTA, Zhang-Suen, Voronoi-based, TBMO and 
SBMO thinning algorithms have been implemented 
using matlab® programming language on duo core 
2.0 GHZ in 201 1. The five methods are applied on the 
demo version of the IFN/ENIT handwritten dataset 
[13]. Figure 3 shows examples of the thinned images 
obtained using the five methods implemented in this 
paper. 




(a) (b) 




(c) (d) 




(e) (f) 

Figure 3. Thinned images produced by (a) original (b) SPTA 
(c) Zhang Suen (d) Voronoi- based (e) TBMO, and (f) SBMO. 



4. Discussion 

An effective thinning algorithm for Arabic text 
should ideally meet the following requirements: 
preserving shape and text connectivity, preventing 
spurious tails, maintaining one pixel width skeleton, 
avoiding necking problem and running time 
efficiently. The results obtained by the five methods 
were compared against the IFN/ENIT dataset based 
on the following criteria: 



A- Preserving the Text Connectivity 
Preserving the text connectivity is the most important 
aspect in any thinning method. Preserving the text 
connectivity can be examined through a comparison 
of the number of connected components in both the 
thinned image and the original one. If the numbers of 
connected components are equal in both images, the 
method maintains the text connectivity. Otherwise it 
does not maintain. The following algorithm is 
proposed to verify the effectiveness of preserving the 
text connectivity for any Arabic thinning algorithm 
(see algorithm 6) [1][9]. 



Algorithm (6): Checking preservation text 
connectivity 

Input: original and thinned images 

Output: preserve the text connectivity or does not 

preserve the text connectivity 

begin 

read the original image. 

scan the image row by row. 

find the connected component by using the 8- 

neighborhood representation. 

count the number of the connected components 

of the original image. 

read the thinned image. 

scan the image row by row 

find the connected component by using the 8- 

neighborhood representation. 

count the number of the connected 

components of the thinned image . 

examine between the number of the connected 

components in the original and the thinned 

images 

if they are equal 

return the thinning method preserve the text 
connectivity 

else return it does not preserve the text 
connectivity 

end 



Figure 4 shows the connected components and their 
numbers for the thinned images of Figure 3. As 
shown in Figure 4, Zhang-Suen method, Voronoi- 
based method, TBMO and SBMO methods generated 
skeletons that preserve the connectivity, because they 
have the same number of connected components in 
the original image. However, SPTA did not do so. 





(e) (f) 

Figure 4. Connected components in the thinned images of (b) 
SPTA (c) Zhang-Suen (d) Voronoi-based (e) SBMO, and (f) 
TBMO methods 
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The text connectivity check algorithm is implemented 
on the thinned images, produced by the five 
algorithms, on the IFN/ENIT dataset. The numbers of 
the skeletons preserved text connectivity are listed in 
the Table 1. Furthermore the performance is 
calculated and listed by the ratio between the numbers 
of images preserved text connectivity and 569 which 
is the number of images used in the testing. 



B- Producing One Pixel Width Skeleton 

In this section, an experiment is carried out to verify 
that the methods implemented in this work produced 
skeletons of one unit pixel width. The stroke 
thickness is measured by the following algorithm 
which first proposed by Hew and Alder (see 
algorithm 7) [9]. 



Table 1 Results of checking the text connectivity 
preservation of the output images of the five selected 
thinning methods 



The 

thinning 

method 


Number of 
skeletons that 
have preserved 
the text 

connectivity out of 
569 


The 

Performance 

% 


SPTA 


0 


0.0% 


Zhang- 

Suen 


513 


90.16% 


Voronoi- 

based 


569 


100.0% 


SBMO 


569 


100.0% 


TBMO 


569 


100.0% 



Figure 5 shows the performance of preserving the 
text connectivity for the five thinning methods, for the 
IFN/ENIT dataset. 



Performance 



100.00% 
90 . 00 % 
80 . 00 % 
70 . 00 % 
60 . 00 % 
50 . 00 % 
40 . 00 % 
30 . 00 % 
20 . 00 % 
10.00% 
0.00% 

TBOM SBOM Voronoi- Zhang- SPTA 
based Suen 




Figure 5. The performance of preserving the text connectivity for 
the five thinning methods for the IFN/ENIT handwritten dataset 



Based on Figure 5, Voronoi-based, TBMO and 
SBMO methods generated skeleton that preserves the 
connectivity for the entire IFN/ENIT handwritten 
dataset. Zhang-Suen algorithm preserved the 
connectivity for 513 images out of the 569 images of 
the IFN/ENIT handwritten dataset. It does not do all, 
because it deletes 2x2 window which leads to 
disconnectivity in some places, specially in the 
junction. While SPTA did not do so. Therefore, 
SPTA will not be used in the other comparisons. 



Algorithm (7): Estimate Thickness (O: is an object in 
an image) 

Input: binary text image 

Output: number of pixels 

Begin 

scan the image row by row 
calculate the area of object O, A(O) 
calculate the perimeter length of the object O, L (O) 
using 8 -neighbouring method 
calculate the stroke thickness T: 
if L > 2 x A then T= (A / (LI 2)) 
else T = round(A/L) 
end 



The proposed stroke thickness estimation algorithm is 
implemented on the produced skeletons by the four 
thinning methods (i.e. the five selected thinning 
algorithms except SPTA), for the IFN/ENIT dataset. 
The numbers of the produced skeletons which have 
one pixel width are listed in the Table 2. Furthermore 
the performance is calculated and listed by the ratio 
between the number of images produced one pixel 
width skeleton and 569 which is the number of 
images used in the testing. 

Table 2 Results of one unit pixel width skeleton on 
the resulted images of the four selected thinning 
algorithms. 



The thinning 
method 


Numbers of 
skeleton have 
one pixel width 
out of 569 
thinned images 


The 

Performance 

% 


Zhang-Suen 


413 


72.58% 


Voronoi- 


569 


100.0% 


based 






SBMO 


569 


100.0% 


TBMO 


487 


85.59% 



The performance of one unit pixel width for the 
four selected thinning algorithms is shown in Figure 6. 
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Figure 6. The performance of one unit pixel width for the four 
selected thinning algorithms. 



Based on Figure 6, the Voronoi-based, SBMO 
methods generated skeleton that has one pixel width, 
for the IFN/ENIT handwritten dataset. The Zhang- 
Suen and TBMO methods did not do so, because they 
produce a lot of spurious tails which will be shown in 
the next section. 

C- Producing Spurious Tails 

In this section, a new experiment is carried out to 
measure the effectiveness of producing spurious tails 
on the four selected thinning algorithms. Spurious 
tails are considered as one of the most common 
problems of text thinning. Generally, these spurious 
tails are usually perpendicular to the thinned text. 
Spurious tails usually are perpendicular to the text 
form a T-junction and are distinct from strokes. These 
are like short terminals not exceeding the thickness of 
the text. The following algorithm is proposed for 
finding the spurious tails (see algorithm 8) [9]. 



Algorithm (8): Checking the Spurious Tails 
Input: text thinned images, and text original image 
Output: the algorithm produces spurious or it does 

not produce spurious. 

begin 

read the original image 

calculate the thickness of character using 
algorithm 7. 

read the thinned image 

for each connected component do the following 
apply the 8-nighbourhood chain code 
representation 

if there is change in the movement direction 
start counting the number of pixels 
if it is terminated and the number of pixels 
less than the thickness of character 

return the thinning algorithm produces 
spurious tails 

else return it does not produce spurious tails 
end 



numbers of the skeletons which have produced 
spurious tails are listed in Table 3. Furthermore the 
performance is calculated by inverting the ratio 
between the number of images produced spurious 
tails and the total number of images used in the test (i. 
e 569). 



Table 3 Results of checking spurious tails of skeletons 
on the resulted images of the four selected thinning 
algorithms. 



The thinning 
method 


Numbers of 
skeleton have 
produced 
spurious 


The Performance 

% 


Zhang-Suen 


569 


0.0% 


Voronoi- 


6 


98.95% 


based 






SBMO 


23 


95.95% 


TBMO 


569 


0.0% 



Figure 7 shows the effect of producing spurious 
tails on the selected four thinning algorithms. 

Figure 8 show examples of spurious tails produced by 
the Zhang- Suen, Voronoi-based, SBMO and TBMO 
methods. 



Performance 

1OO.O0K 
80.00% 

- 60 00% 
40.00% 
20.00% 

V 0 . 00 % 
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Figure 7. The effect of producing spurious tails on the four 
selected thinning methods. 
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The effectiveness is measured using the proposed 
checking spurious algorithm by the number of 
thinned images produced those spurious tails. The 
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K * 

(c) (d) 

Figure 8. Example of spurious tails produced by (a) the Zhang- 
Suen (b) Voronoi-based (c) TBMO and (d) SBMO. 



Based on Table 3, Voronoi-based thinning algorithm, 
SBMO are appropriate and effective as they generate 
less spurious tails. While Zhang-Suen and TBMO 
methods did not do so. 



D- Shape Information 

The Arabic text thinning algorithm must be able to 
deal with various shapes or structure in the text 
images such as curves, arcs, junctions, elongated, 
loops, and lines. To illustrate the impact of the fourth 
selected methods on shape information preservation 
of the Arabic text, the Arabic handwritten word 
‘Zanoush’ (lrA)) has been chosen. The image was 
chosen because it consists of various shapes, 
including dots, junctions, arcs, zigzag and loops. The 
skeletons obtained by the four methods of the word 
‘Zanoush’ (lAA)) are shown in the Figure 9. 



Based on Figure 9, the Voronoi-based, SBMO and 
TBMO methods can deal with various shapes and 
structures in the characters images such as curves, 
junctions, arcs, loops, elongated and line. Generally, 
the skeletons produced by the three methods preserve 
the topology of the images as the images may have 
slightly small changes in the shape. But the most 
important fact is that the three thinning algorithms 
still preserves the abstraction of the image patterns 
although the size of the objects may change. This fact 
has to be taken into consideration as the thinning 
algorithm is applied for handwritten text and the 
skeletons produced can be applied in character 
recognition system. However, the four thinning 
algorithms did not preserve the geometric aspect of 
the images. 

On the other hand, it is clear that in the Figure 9, 
Zhang-Suen thinning algorithm did not preserve the 
topology of the images. It made big changes on the 
text features, especially at the terminals [9]. 



it 



ft 







(a) 



(b) 





Figure 9. Arabic handwritten word ‘zanoush’ thinned by 

(b) Zhang-Suen (c) Voronoi-based (d) SBMO and (e) TBMO 
methods. 



E- Necking Problem 

Necking problem arise when two strokes cross each 
other as the shape of the letter X in English [13], 
thinning at cross over points is one of the main 
challenges in the thinning process and the points of the 
intersection can be maintained at the crossover point. 
From the word shown in the Figure 3, the character 
Waw V, which is located in the middle of the word, is 
enlarged in Figure 10 to illustrate the efficiency of the 
four selected algorithms in thinning characters 
necking. As shown in Figure 10, all methods were not 
perfect in thinning the intersection [9]. 




Figure 10. Necking problem produced by (b) Zhang-Suen (c) 
Voronoi-based (d) SBMO, and (e) TBMO methods. 



F- Speed 

The processing time for the five methods is calculated 
in seconds based on the CPU time obtained during the 
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program execution. Table 4 shows the overall average 
processing time taken by the five thinning methods in 
thinning the 569 images of IFN/ENIT handwritten 
dataset. 



Table 4. The overall average processing time taken 
by the five selected thinning methods 





SPTA 


Zhang- 

Suen 


Voronoi- 

based 


SMOM 


TMOM 


Total time 


(sec) 


7.2662 


5.0867 


95.1059 


15.7230 


9.8267 



Based on the Table 4, among the five methods, 
Zhang- Suen thinning algorithm is the fastest thinning 
method, while Voronoi-based thinning method is the 
slowest one. 



5. SUMMARY OF THE RESULTS 

Table 5 summarizes the results of the comparison 
done between the SPTA, Zhang-Suen, Voronoi-based, 
SBMO and TBMO methods. 
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Figure 11 shows the performance of the five 
selected methods on the IFN/ENIT handwritten 
dataset in a chart. 
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Table 6 shows the performance of the five selected 
methods, for IFN/ENIT handwritten dataset, 
calculated based on effectiveness of the methods in 



1 Performance 




5 BOM VoronoK Zhang- SPTA 
based Suen 



Figure 11. The performance of the five selected methods on the 
IFN/ENIT handwritten dataset. 



Based on Table 6, the Voronoi-based method 
produced the best skeleton compared with the other 
four selected thinning methods. It produced about 
99.65% acceptable skeletons that preserve shape and 
connectivity and moreover does not produce spurious 
tails and have one-pixel width, on the IFN/ENIT 
dataset. 



6. Conclusion and Future Direction 

In this paper, five of the state of the art thinning 
algorithms were selected and implemented. The five 
algorithms including: SPTA, Zhang-Suen thinning 
algorithm; Voronoi-based thinning algorithm, 
Thinning and Skeletonization based morphological 
operation methods. The five selected algorithms were 
applied on the IFN/ENIT dataset. The results obtained 
by the five methods were discussed and analyzed 
against the IFN/ENIT dataset based on the following 
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criteria: preserving shape and the text connectivity, 
preventing spurious tails, maintaining one pixel width 
skeleton and avoiding the necking problem as well as 
running time efficiently. As well as some 
performance measurement for checking text 
connectivity, spurious tails and calculating the stroke 
thickness were proposed and carried out. 

The Voronoi-based thinning algorithm produced the 
best skeleton compared with the SPTA, Zhang- Suen, 
Thinning and Skeletonization based morphological 
operation methods. It produced about 99.65% 
acceptable skeletons that preserve shape and 
connectivity and moreover does not produce spurious 
tails and have one-pixel width, for the IFN/ENIT 
dataset. On other hand, the others selected methods 
produced about 0.0%, 57.53%, 98.65%, 61.86%, 
respectively acceptable skeletons for the same data 
set. Zhang- Suen thinning algorithm is the fastest 
among the five selected algorithms, but it does not 
preserve the shape information and produce spurious 
tails. SPTA failed in all measurement. It cannot be 
used to thin the Arabic text. Voronoi-based and 
Skeletonization based morphological operation 
thinning methods are appropriate and effective in 
thinning the Arabic text. Thinning based 
morphological operation method preserves the shape 
information, but it produces a lot of spurious tails. On 
other hand, the five selected methods suffer from 
some necking problems on certain characters such as 
Waw (j) and Meem (f). 

In the future, it would be interesting to use the criteria 
that have been proposed in this paper, to measure the 
effectiveness of the thinning methods that have been 
used in the development the segmentation and 
baseline detection of Arabic text, and improve these 
methods using robust Arabic thinning algorithm. 
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Abstract 

In the sequential steps to generate the verification 
code, there are many barriers. Recently, research 
started to propose solutions for some of them. This 
paper discusses all the steps briefly. This paper will 
explore the area of bad practices, namely anti- 
patterns, and their consequences in UML class 
diagram. The proposal depends on using the 
integration between event-B as a formal method 
and Pattern language. It is a new methodology to 
detect anti-patterns. The proposal also integrates 
requirements, codes and verification in system 
development life cycle. Satisfiability Modulo 
Theory (SMT) solver is suggested to enhance the 
proposed approach. The benefits of using the 
proposal are increasing the automation degree, 
reducing the proof effort, detection of model anti- 
patterns, generating high quality code and 
reusability. This is specially to generate the 
verified code from UML class diagram model. A 
case study of an auto teller machine (ATM) model 
is presented to demonstrate the proposed 
methodology. 

Keywords: Formal method, Event-B, Pattern, Anti- 
pattern, SMT (Satisfiability Modulo Theory ), Code 
generation, RSM ( Resource Standard Metrics), 
ATM(auto teller machine). 

1. Introduction 

Event-B is a system modeling language in a closed 
system. It is the extension of the B -method for 
specification, refinement and verification for 
complex systems [1]. Nowadays, there are many 
tools to generate programming language code to 
UML model as “Altova UModel” [2]. But our 
scoping in this paper is a guarantee to avoid not 
only syntax error but also the anti-patterns 
problems, and it’s also guarantee for high quality 
code. There are many classifications for UML anti- 
patterns as in reference [3], [4] and [5]. The 
proposed approach in this paper uses the following: 



Pattern language, Proof obligation as a criterion, SMT 
solver as a prover, UML-B converter. Rodin platform 
[6] is suitable to create model and some tools 
compactable with Rodin as B2C# and UML-B tools. 
The benefit of this combination isn’t integrating 
requirements, codes and verification in system 
development life cycle only, but also the consistence 
refinement, verification with high automation, 
reusability model and detection of anti-patterns 
problems. The proposed approach can also be applied 
on any mathematical algorithm with some updating; I 
will discuss this in details in the proposed approach 
section. An ATM model is presented to demonstrate 
the proposed methodology 

The rest of this paper is organized as follows: section 2 
presents the related works. Then we will introduce The 
Proposed Approach in section 3. Then the proposal will 
be applied on the ATM model in section4. Section5 and 
6 will present the result analysis and conclusion. Linally, 
the references are viewed in section7. 

2. Related works 

Software quality was ameliorated by several techniques. 
Design pattern was presented as a good solution in [7]. 
Also reference [8] presented metric approach to detect 
five types of UML anti-patterns. But [4] displayed a 
prototype approach to detect anti-pattern certainly for 
MOL; based modeling language. Reference [9] 
presented survey about many formal methods that have 
been proposed in recent years to improve the software 
quality. This approach includes specification and 
modeling languages as well as formal verification 
techniques, such as model checking, and theorem 
proving. 

3. The Proposed Approach 

In this section, a brief description for the phases 
depending on Rodin platform will be displayed to 
generate the validation code. The proposed phases are 
shown in figure 1. I have discussed the phases for 
UML class diagram model. In case of algorithm the 
phases are similar but the problem of algorithm anti- 
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patterns needs to be discussed for different kinds of 
algorithm’s anti-patterns. The following is the 
phases’ description. 

First phase is converting UML class diagram 
model to event B model. That is by UML-B. UML- 
B provides Packages, context, class and state 
machine diagrams in event B [10]. The semantics 
of a UML-B model is given by the Event-B 
generated by the UML-B tool according to a set of 
translation rules. Each UML-B machine gives rise 
to both an implicit Event-B context and machine. 
The context is used to define types for the classes 
and states in the UML-B machine. Event-B 
components as machine, classes, class attributes 
and associations become variables. Events and 
transitions in classes and state machines become 
events in the generated Event-B machine. Figure2 
is showing abstract overview of UML-B. The 
figure2 shows a package diagram which has a class 
diagram containing class clsl and cls2 then 
properties of class and properties of association of 
At2 finally, a State machine stm of class clsl. 
Figure 2 presents respectively, package diagram, 
class diagram, properties of class, properties of 
association and state machine. 

In the generated Event-B machine, the class 
els land cls2 give rise to variables. The class clsl 
consists of the attribute atl of type N and also the 
association at2 of type cls2. 

The generated Event-B machine for machine 1 is 
shown in the Rodin screenshot of Figure 3. Each 
Event-B statement is preceded by its label, which 
describes its purpose. 



f UML model J Algorithm / 
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Figure 1 the proposal approach phases 
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Figure 2 UML-B abstract diagrams 
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MINTS 

clsl .type: dsl e p notttorei 

dsl.type: cls2 1 P not theorei 

atl.type: atl e clsl -* N not theorei 

at2.type: at2 e d$2 -* clsl not theoren 

ll.type: ll e P (clsl) not theorei 

iJ.type: i2 e p (clsl) not theorei 

iwl: i2nlxi not theorei 

EVENTS 

INITIALISATION: not extended ordinary 

END 

tril: not extended ordinary 

ANY 

self 'constructed instance of class clsl 

WHERE 

self, type: self e clsl SET \ clsl not theorei 
THEN 

sta,enterstate.il: ll il u (self) 
clsl constructor: clsl = clsl 11 (self) 

END 

trs2: not extended ordinary 

NY 

self 

WHERE 



trs3: not extended ordinary 

ANY 

self contexual instance of class dsl 
WHERE 

self, type: self e clsl not theorei 
sti lsin clsl: self til not theoren 
THEM ' 
skip: 

END 

trs4: not extended ordinary 

ANY 

self contexual instance of class clsl 
WHERE 

self, type: self e clsl not theorei 
sti isin il: self « cl$2 not theorei 
IN ‘ 

stn.leavestate_i2: i2 = i2 \ (self) 
dsl .destructor: clsl =dsl \ {self} 

dsl,at2.de$tructor: at2 : self « at2 
dsl.atl destructor: atl : {self} ♦ atl 
END 



self, type: self « clsl not theorei 
sti isin il: self e dsl not theorei 
THEN 

sti.enterstate.i2: i! - i2 u (self) 
sti leasestae il: ib il \ (self) 

END 

Figure 3 Generated Event-B specification of 
machine 1 



The second phase is detection UML anti-pattern 
problems. When you want to construct models, you 
should be alert for anti-patterns and correct them. 
That is to send a consistent correct pattern to the 
next phases. An anti-pattern is some kind of 
problem in software project situation. The UML 
Anti-pattern was classified into three main parts; 
semantic class, behavior class and structure class. 
Our proposal detects the structure class. That is by 
creating event-B model and solving the problems. 
The third phase is Refinement Approach in UML- 
B. Generally, the refinement techniques concerning 
the concept of refined classes and inherited 
attributes are described briefly from [10]. The 
benefit from refined classes and inherited attributes 
is performing refinement in Event-B. The 
refinement of classes and inherited attributes in 
UML-B reflects the refinement of variables in 
Event-B. A refined class is one that refines a more 
abstract class, and an inherited attribute is one that 
inherits an attribute of the abstract class. Some 
elements of an abstract UML-B model needs to be 
retained by the refinement in UML-B. 

From [10] in Event-B refinement, may drop some 
of the variables and may introduce new variables. 
But in UML-B refinement, a machine that refines a 
more abstract machine may contain refined classes 
where each refined class refines a class of its 
abstract machine. Therefore, the motivation for 
refined state machines and refined states comes 
from combining the state machine hierarchy in 
UML-B with refinement in Event-B so in UML-B 
refinement; a machine may contain refined state 
machines and refined states. 

The fourth phase is using Satisfiability Modulo 
Theory (SMT) solver. The Satisfiability Modulo 
Theory (SMT) problem is a decision problem to 
determine if a given logic formula is satisfiable 
with respect to a combination of theories expressed 
in first-order logic. Theories of interest for the work 



described in this work include un-interpreted functions 
with equality and integer arithmetic. Since the validity 
of a proof obligation can be decided by checking the 
un-satisfiability of its negation, SMT-solvers are natural 
candidates for discharging the verification conditions 
generated in the application of Event-B methods. SMT- 
solvers can for example handle a formula like 

x < y A y < x + f (x) A P(h(x) - h(y)) A ^P(O) Af (x) = 0 
Which contains linear arithmetic on real’s (0,+,-, <), 
and un-interpreted symbols (P,h,f). SMT-solvers use 
decision procedures for the disjoint languages (for 
instance, congruence closure for un-interpreted symbols, 
and simplex for linear arithmetic) and combine them to 
build a decision procedure for the union of the 
languages. The ability to handle quantified formulas as 
well as to construct certificates (proofs) of their results. 
SMT solver used to increase the automatic proof in 
event-B model [11]. 

The fifth phase is design Event-B Pattern. Design 
Patterns in Event-B is to have a methodological 
approach to reuse former developments (referred to as 
patterns) in a new development. The pattern's approach 
is a tool for building domain models for inexperienced 
designer [12] .The proofs of the pattern can be reused 
too. It was shown that for the special case as a model, 
but this model can’t see any context and also its events 
haven’t any parameters, the generation of a refinement 
of the problem at hand is correct by construction, and 
no need to generate PO again. The correct merge 
between the problem and pattern is criterion for correct 
constructive. Once all checks are done, the refinement 
of the problem is generated by merging the pattern 
refinement with the problem. Then save the model as a 
pattern saves the proof for increasing the degree of 
automation. 

The sixth phase is the code generation. The Code 
generation philosophy is automatic generation of source 
code may be regarded as an “open-loop” refinement 
step. When there is not static equivalence checking 
against the previous refinement is possible. It classified 
into three steps rewriting, converting and building. 

• The Rewrite step 

After final refinement step and converted the final 
model to be a pattern, then using an easily translatable 
subset of event-b. First, in the context each constant 
converts to its literal value. The guards Guds range 
convert to comparison statements. Abstract sets convert 
to its numerical meanings via mapping functions. Also 
the global variables are disallowed from the right of 
assignments and must be restated as intermediate local 
variables. But logical OR is not supported. So we solve 
this by divided events with this type of guards into two 
events. Also we can merge events. This convert 
requirement does not contribute to the translation 
process, but enforces division of events into simplest 
form. 

• The converter step 

Once pattern has been converting manually, all selected 
events- without explicit- was translated automatically. 
A single file is produced for each called “leaf machine”. 
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For example B2C# converter is invoked by a single 
user action. This step classified into two steps: first 
is the event translation and the second is calling 
function generation. 

• The Build step 

Once automatic converter of the Event-B model is 
complete, an execution environment must 
be provided and compiled by a suitable 
development tool chain. Implementing functions 
must be provided for all un-interpreted functions as 
instrumentation and deadlock functions. The file 
generated after define file “Eventblncludes.h” 
whose inclusion has been automatically inserted. A 
top-level main function must be provided to call the 
generated functions “INITIALISATION” and 
“Iterate”. 



4. Case Studies 

The ATM is a public application many searcher 
work in many fields. An auto teller machine ATM 
is a machine that allows bank customers to do some 
of the banking transactions 24 hours per day. The 
practical application was created by Rodin platform. 
The first phase is convert ATM class diagram to 
event-B model. The figure 4 was showed UML-B 
specification of the ATM abstract machine. The 
abstract machine consists of a class Account with 
its attribute BAL and four events namely, create 
Account, deposit, withdraw and check Balance. The 
Account class represents the set of accounts that 
currently exist in the system. The attribute BAL 
represents the balance of an account. The withdraw 
event has added am parameter with type natural 
number. The parameter is shown in the property 
view in figure 4. It includes the guard and action. 
self is the self-name property defined for the class 
Account. If the amount, am, is less than or equal to 
the balance in the account, then the withdraw event 
occur. 
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Figure 4 UML-B specification of the ATM abstract machine 

The second phase is anti-pattern detection to solve it. 
The following examples present some structure anti- 
patterns. Figure 5 presents Anti-pattern in package 
diagram when loos association. The error description 
list leads designer to know that. Also figure 6 shows 
“Parameter Has Invalid Default data type anti-pattern” 
and how event-B detects it. Another example is “Invalid 
expression constrain anti-pattern” as shown in figure 7. 



L3 





Figure 5 lost association in package diagram anti-pattern 
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Figure 6 parameter Has Invalid Default Data type anti-pattern 
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Event-B detect anti-pattern and also supports 
designer to solve the anti-pattern as shown in figure 
8. When a designer selected any row from problem 
list, he can read the exactly place of error in event B 
language. 

We have continued detection anti-pattern since the 
problem list empty as shown in the figure 9. It 
presents the final iteration in the second phase. 
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Figure 9 the final iteration in second phase 

The third phase is generate and refinement the ATM 
model. We discussed in more details seven refinements 
to ATM model in [13]. 

The fourth phase is applied SMT solver to increase the 
automation degree by decrease the proof obligation. 
We used Automatic verification of proof obligations as 
SMT-solvers. The plug-in communicates with the SMT- 
solvers using files and operating system commands. 

The configuration of the plug-in includes a choice of 
SMT-solvers. It is now available to the formal methods 
community as an exploratory package through Rodin’s 
official source code repository. Currently, the 
verification with the SMT- solver has to be activated by 
clicking a button as shown in left part in Figure 10. 
Whenever the verification is successful as shown in 
right part in Figure 11, where the status of the proof 
obligation is updated and then the user may move to the 
next proof obligation. 

The fifth phase is design pattern that is by the plug-in 
provides a wizard, which taking users through different 
steps namely, matching, checking syntax, renaming and 
incorporating. We can see the details of applying 
pattern in [8]. This phase is important to save proofs. 



Figure 8 event B supports to solve structures anti-patterns 
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/* No action */ 


x > y 


if(x>=y) { 


x = y + z 


x = y + z; 
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'C 

i 

N 


x = F(y) 


F(y,&x); 


(ai-T>) = F(xi->y) 


F(x,y,&a,&b); 


x = a(y) 


x = a(y); 


X y 


x = y; 


Table 1 sample of rewriting table 




Figure 10 the screen shot of the proof before the SMT proof, the 
button is active and the status is not proved 




The sixth phase is Automatic C# Code Generation for 
ATM Pattern Model. The strong constrain to start this 
phase is received model to 100% was Proofed, so we 
used SMT solver to increase the PO as we can. That’s 
to decrease the manually proof. 

Now in brief the three generation code steps will 
present in C# code by using B2C# plugin and it’s the 
same for Java programming language. The different 
from language to another is in the first manual step. It is 
the rewriting step. Where designer make some 
changeable as in the tablet. That is for static traceability 
between model and source code, and also dynamic 
traceability between model and executable file. There 
isn’t any problem with generate looping recursive or 
iteration, until deadlock condition detected 
automatically. 

Then the second step is Convert ATM Pattern. Event 
converter of ATM Pattern that is will be done by 
converting each event of event-B ATM model to an 
individual C# function. See the following example in 
figure 12. After designer has converted events, Calling 




Figure 12 Event converter for Transaction event 

Finally, Building of the top-level C# main function 
must be provided to call the generated functions 
“INITIALISATION” and “Iterate”. The calling of 
INITIALISATION function of ATM model must be 
called before Iterate. 



Figure 1 1 The screenshot of the proof after a successful proof, 
the button has been dis-activated and the status is “proved” 
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f* Translati on Begins Here */ 
using System; 

using System.CoM ecti on s.Gen eric; 
public classmi ni_] { 

f* Global Constants defined of [nni ni_ct<] */ 

const long[] l^new long[1000];/* C# array 
declaration when constant is given in range of 
predicate */ 

f* No translatable type found for[n] */ 

/ * Global variables defined in [mini_l.mch] */ 
ulong m; /* Integer in range undefined */ 
i nt p; I nteger i n range */ 
i nt q; /* I nteger i n range */ 
public bool I NITIALI SATION() 
public bool Iterated 



Figure 13 Part of ATM C# derived code 
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Table 2Proof Obligation of ATM refinements models after apply 
the proposed phases 



Avg eLOC : 20.00 

Avg Cyclomatic Comp. ..: 4.33 

Avg Parameters : 0.00 

Avg Comment Lines : 3.67 

Avg Physical Lines ....: 32.33 

Avg LOC : 23.00 

Avg 1LOC : 15.00 

Avg Interface Comp. ...: 1.00 

Avg Return Points : 1.00 



5. Analysis of the Results 

The following statistics of PO (Proof Obligation) 
criterion generated from ATM UML-B different 
models are shown in table 2. We note that Rodin 
model can’t prove the model automatically 100% 
so we also can’t generate code from Rodin event B 
model directly. But the proposed approach in this 
paper tends to increase automatically the proof 
obligation and save it. 

Also we compared between manual and automatic 
code generation by RSM (Resource Standard 
Metrics); a quality analysis open source tool [14]. 
This has been applied on ATM refinement models. 
We note that the complexity average decrease 60% 
and all the other RSM features have been 
ameliorated in the automatic generation code in our 
approach. This is shown in figure 14a and 14b. 



Figure 14 a RSM quality report for automatic generation ATM code 
for first refinement 



Avg eLOC 5.00 

Avg Cyclomat ic Comp. . 2.60 

Avg Parameters : 0.00 

Avg Comment Lines 1,70 

Avg Physical Lines . . . * : 12.70 

Avg LOC - : 7.80 

Avg 1LOC ; 4.10 

Avg Interface Comp. . . ■ : 2.40 

Avg Return Points : 2.40 



Figure 14 b RSM quality report for manual ATM code for first 
refinement 
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6. Conclusions and Further Works 

In this paper I proposed an approach that 
integrates Event-B and Pattern to enhance the 
software quality and detect the anti-patterns 
problems. The proposed approach improves the 
proof obligation by SMT solver. The benefit of 
applying the proposed approach isn’t integrating 
requirements, codes and verification in system 
development life cycle only, but also the 
consistence refinement, verification with high 
automation, reusability model and detection anti 
patterns problems. When I applied the proposed 
phases on the ATM refinements, I conclude by 
proof obligation criterion that using the proposed 
phases increase the automation and save the proof 
100%. And also detect structure anti-pattern in 
UML class diagram model. Finally, the complexity 
average and all features in RSM report show the 
increasing quality of the automatic generated code. 
In the future, we are going to enhance the code 
anti-patterns and propose a new approach to 
detect semantic UML anti-patterns. We will also 
explore the link between formal methods and other 
UML diagrams. 
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Abstract 

In the present day power scenario Unit 
Commitment (UC) is one of the complex 
challenging tasks for power system operators. UC 
is a nonlinear, non-convex, large scale, mixed 
integer problem. To mitigate this complex problem 
in this paper a hybrid Differential Evolution with 
local search technique and an adaptive Crossover 
using triangular distribution factor (DE-TCR) is 
presented. The salient features of the proposed DE- 
TCR are: An intelligent chromosome representation 
is used which is independent of number of units 
present in UC problem thereby reducing the length 
of chromosome. It is able to interlink the cross over 
probability in conjunction with the non- separable 
and decision variable dependency of UC problems. 
Local search using Sequential Quadratic 
Programming, which has proved in improving the 
performance of the classical DE algorithm. 
Initially, the proposed DE-TCR is used to 
determine an optimal generation schedule for each 
hourly demand. Later, SQP is utilized to find the 
optimal dispatch strategy to minimize the fuel cost. 

The effectiveness of the proposed algorithm is 
tested on standard 4 units, 8 hour and 10 units, 24 
hour UC systems. Results demonstrate that the 
proposed algorithm can perform better and produce 
global optimal solutions compared to that of other 
reported methods. 

Keywords: Unit Commitment, Differential 

Evaluation, Sequential Quadratic Programming, 
Adaptive cross over and Triangular Distribution 
factor. 

1. Introduction 

Generation scheduling or Unit Commitment (UC) 
is the problem of finding the optimal number of 



generators which are to be turned on and off according 
to the time varying load demand so that the solution is a 
global optimal economically. Further, the solution 
should also satisfy the standard UC constraints such as 
spinning reserves, minimum Up and Down time of 
units, ramp rate limits, crew constraints etc,. In the 
present day power scenario, under the deregulation 
environment the problem of generation scheduling is 
highly complex because of the increase in the number 
of generators, type and size of generating facilities, and 
also due to time varying load demands. Further, from 
the solution of optimal schedule obtained for the UC 
problem, a dispatch strategy should be found among the 
committed generators which should give a minimum 
fuel cost. This problem is known as Economic Load 
Dispatch (ELD). As a result the focus today for the 
power system engineers is to find a global optimization 
strategy which can solve the problem of UC and ELD in 
most economical and reliable manner [1]. 

From the literature available it is understood that there 
are many approaches available to solve the complex 
problem of UC and ELD. All those methods reported 
have its own advantages and disadvantages [2]. In 
general these methods can be classified as conventional, 
heuristic and hybrid of these. Conventional method 
includes Exhaustive Enumeration [3], Priority Listing 
[4-5], Dynamic Programming (DP) [6-7], Integer 
Programming [8-9], Branch and Bound [10], 
Lagrangian Relaxation methods (LR) [11], Sequential 
Quadratic Programming (SQP) [12]. All these 
conventional approaches suffers from the drawbacks of 
the dimensionality of the problem and further, these 
methods cannot handle the constraints of UC problem 
effectively which in many cases results in sub optimal 
solution. Heuristic approaches like Tabu Search (TS) 
[13], Simulated Annealing (SA) [14], Expert systems 
[15], Fuzzy Systems [16], Artificial Neural Networks 
(ANN) [17], Genetic Algorithm (GA) [18-19], 
Evolutionary Programming (EP) [20-21], Ant Colony 
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Systems (ACS) [22-23], Particle Swarm 
Optimization [24], Differential Evolution (DE) [25- 
26], Self Adaptive DE [27], Bacterial Foraging 
(BF) [28], Imperialistic Competition Algorithm 
(ICA) [29]. Though these methods overcome the 
drawbacks of conventional approaches it suffers 
from high computational time. At present, hybrid of 
heuristic and conventional methods is implemented 
to solve the problem of UC and ELD. These hybrid 
methods include GA-PSO [30], GA-SaDE [31], 
Fuzzy assisted Cuckoo search algorithm [32]. 
These techniques exploits the advantages of 
conventional and heuristic search in an effective 
way thereby handling the complex problem better 
even with the consideration of all UC and ELD 
constraints. In the hybrid approaches the major crux 
lies in the way these methods are hybridized, based 
on which the performance of these methods varies 
with one another. Out of all these hybrid methods, 
algorithms that are based on Differential Evolution 
[DE] are the most simple and effective one [33]. 
However some drawbacks can be seen in DE 
approach along with GA approach. Few of them 
are: restriction to apply for problems with limited 
control variables, no guarantee for solution to be 
globally optimum, higher computational time etc. 
Recently Differential Evolution with Adaptive 
Crossover based on Triangular distribution Factor 
(DE-TCR) [34] is implemented for various 
complex optimization problems. The following 
merits may be observed in DE-TCR 
approach: 

• The method avoids the unnecessary increase 
in the size of DE chromosome. 

• Problem- specific operators incorporated in 
the DE method makes the method suitable 
for solving UC problem. 

In this paper DE-TCR is implemented for UC 
problem and SQP is used for ELD problem. The 
simulation results using DE-TCR of various 
optimization problems are found to be more 
promising and very attractive. 

The remaining paper is organized as follows: 
Section 2 deals with the formulation of UC-ELD 
problem. Section 3 describes the intelligent 
chromosome representation and DE-TCR 
algorithm. Section 4 deals with the implementation 
of proposed DE-TCR and SQP for UC-ELD 
problem. Numerical results and discussion are 
presented in Section 5. Finally conclusions are 
drawn in Section 6. 

2. Formulation of UC- ELD Problem 

The objective function of UC-ELD problem is a 
minimization function (1) which is composed of the 
unit operating cost (fuel cost), the startup cost of 
the unit at any given time. Hence the Combined 
Objective Function is as given below [1] 



NT NG 

MinCOF = YZ FC u U u + SC u a - U w _ 1} )U U (D 

t = 1 i = 1 
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(i) Power balance constraint 

NG 
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(iv) Spinning Reserve constraint 

NG /C x 

Z p iT u i,< - PD < + R , ( 5 ) 

i = 1 

(v) Ramp up/down rates 

Pi, max (t) = min [Pi, max Pitt ~ 1)} + lRU i 
P i,min W = max^min P itt ~ 1)} + *PF>i 

Where 

FC it -a { +b i P it +c i P i ] is the fuel or production 

cost of the i th unit at given time 4 t\ whereas a i? bi and q 
are the cost coefficients and P i>t is the amount of real 
power generated in the i th unit.; SC i>t is the generator 
Start Up cost of i th unit at given time 4 t’ ; U ijt denotes the 
unit ON/OFF status. U u =l represents ON state of i th 
unit at given time 4 t’ and U it =0 represents OFF state of 
i th unit at a given time 4 t’ ; PD t is the real power demand 
at a given time 4 t’ ; T i t on is the total time for which the i th 
unit is turned ON at a given time 4 t’ . T it off is the total 

time for which the i th unit is turned OFF at a given time 
4 t’ ; MUTi is the minimum up time for the i th unit; MDTi 
is the minimum down time for the i th unit; P i9 max is 
the maximum power limit of i th unit ;Pi,min is the 
minimum power limit of i th unit; RUi is ramp up rate 
for i th unit; RDi is the ramp down rate for i th unit; x is 
the UC time step (60 min.). 

3. General Purpose DE-TCR Algorithm 

The global optimal solution for the UC problem is very 
difficult to obtain, as the search has to be carried 
amongst the large number of populations. The proposed 
DE-TCR algorithm applied to UC has the following 
requirements: 

1. Need for representing the chromosome intelligently, 
that reduces size and number of populations. 

2. Need for designating the Problem- specific operators 
those are incorporated in the DE method effectively that 
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creates less number of generations and populations. 
3. Need for a Local search method that deals with 
more feasible solutions and avoid infeasible 
solutions. 

This section provides designing of chromosome 
and problem specific operators like mutation and 
cross over operators. Also, this section discusses 
the modifications in operators that bring 
developments in the DE-TCR approach. Finally the 
general purpose DE-TCR and SQP algorithms are 
presented. 

3.1 Intelligent chromosome representation 
The competency in solution not only depends on 
the algorithm which is used, but also depends on 
the methodology by which the problem is encoded 
in the algorithm. Therefore to improve the 
efficiency of the algorithm an intelligent solution 
(chromosome) representation is done as explained 
below. 

• Conventional coding 

In the conventional approach the length of each 
decision variable is equal to the product of number 
of generators (NG) and the total hours (NT) for 



Time 1 st hour 



24 th hour 



r 



_A_ 



^ r 



_A_ 



0 

or 

1 
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or 

1 


0 

or 

1 


0 or 




0 or 

1 


Oor 

1 


0 

or 

1 


Oor 

1 


* 


2 




10 




231 


232 




240 






'V' 






240 Decision Variables - ‘O’ Unit OFF, ‘U Unit ON 



Figure 1: Coding of Decision Variable in 
conventional approach 

which the power demand schedule is known. For 
example if NG=10 and NT =24 hrs, then the length 
of decision variable is 240 as shown in Figure 1 . 



variable represents a decimal number 0.7321, then this 
value is multiplied with the decimal number 1023 
( which is the equivalent binary of‘ 1111111111’ when 
all unit is ON) which is ‘1011101100’ as shown in 
Figure 2. 



Time 


1 hr 


2 hr 


3 hr 
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0-1 


0-1 


0-1 




0-1 
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ON/OFF 


Dec 


dec 


Dec 




dec 


dec 



Figure 2: Representation of Decision Variable in the 
proposed approach 

This approach ensures that the maximum length of 
decision variable for any size of system will not exceed 
24. Further the search space is limited to ‘O’ to ‘1’ 
which makes the algorithm stronger in landing at global 
optimal value with reasonable population size and 
generations. 

3.2 Algorithm of DE-TCR 

DE is developed by Storn and Price in 1995 [35]. The 
major advantage of DE is its simple theoretical frame 
work; relatively few control variables and require less 
computational time with proven convergence to global 
optimality. In this section the fundamental algorithm of 
DE and the modified versions of DE known as SaDE 
and DE-TCR are presented. 

3.2.1 The fundamental DE Algorithm 

The fundamental DE algorithm is as explained below: 

• Population Initialization 

Similar to other Evolutionary Algorithms, DE starts 
with a randomly generated initial population with N 
dimension of size NP. Each individual v is within 
the interval [lb, ub]. 

• Mutation 

For each individual, select 3 random other 
individuals ( X x , X 2 and X 3 ) from the population. 

Using one individual say X, as base and other two 

X 2 and X 3 as differentiation vector, form the 
mutant vector V as 



With the increase in number of generators, the 
length of decision variable increases accordingly 
which will produce a detriment effect in the 
solution by either introducing slower convergence 
or landing in sub optimal solution. 

• Intelligent coding 

In this approach a decimal number in between ‘0’ 
to ‘1’ is coded in each segment of decision variable. 
This value is multiplied with the decimal equivalent 
to the size of generators in binary form. Then this 
decimal number is converted to its equivalent 
binary of length equal to the size of generators. For 
example if NG =10 and NT =24 then the length of 
decision variable is 24. Say, a segment of decision 



v = x x +f(x 2 -x 3 ) 



(7) 



Where f is a scale factor which controls the 
difference vector. 

• Cross Over 

DE utilize uniform crossover to generate new 
individual z using 

[v if rand <C 

z = < , W 

[ x otherwise 



Where rand is a random number taken from the interval 
[0, 1] and C r is the crossover probability. If Z 
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exceeds the boundary limits, reinitialize it. 

• Selection 

The above process is repeated and their 
corresponding fitness is evaluated. If the 
function value is better than the old individual 
then replace the old individual with the new 
one and repeat the complete process of 
mutation, cross over and selection until 
maximum generations or convergence is 
attained. 



3.2.2 Self Adaptive DE Algorithm 
Though DE outperforms other known heuristic 
methods, in most of the optimization problems it 
suffers from the drawback of conventional 
procedure for crossover operator 4 C r ’. In 
conventional procedure, the value of 4 C r ’ remains 
constant which may lead to local optimal values. 
Hence a self adaptive DE was introduced in [31]. In 
this procedure the values of the control parameters 
T and 4 C r ’ are varied based on evolution during the 
run. Both these values depend on the individual 
chromosome. For each individual these control 
parameters are changed as follows 



fu 



£+1 



\fi+ rand x * f u if rand 2 < r, 
f] g otherwise 



(9) 



and Cr i , g+ 1 



irand 3 if rand 4 < r 2 
I Cr ig otherwise 



( 10 ) 



where z x =0.1 and r 2 =0.1 are the 
probabilities to adjust control parameters f and 
4 Cr and f t = —1.5 and f u =1.5 are the lower 

and upper bounds. Since the updated values of f 
and 4 C r ’ were taken before the operation of 
mutation, which in turn influence mutation, 
crossover and selection. This adaptation of DE 
operators helps to converge much faster when 
compared to the conventional DE algorithm. 



3.2.3 DE with Triangular distribution factor for 
Crossover (DE-TCR) 

From [34] it is understood that for separable 
problems low value of 4 C/ is suggested and for 
non-separable problems high values are suggested. 
SaDE does not take this factor into account. Hence 
a novel DE algorithm with triangular distribution 
factor for f and 4 C/ according to the complexity of 
the problem is introduced in [34]. The salient 
modules of the proposed DE-TCR are (a) Adaptive 
Cross over (b) Local search using SQP. 



3.2.3. 1 Adaptive Cross Over with triangular 
distribution factor 

Initially in DE-TCR three different (triangular) 
distribution factor for 4 C/ such as minimum 



(miner), median (medcr) and maximum (maxcr) 
between the ranges 4 0’ to 4 1’ are considered. Then 
based on a random number (‘rand’, between 0 to 1), 
every chromosome is assigned its own 4 C r ’ value using 
the formula given below. 

If rand < [(medcr - miner) x (2/(maxcr - miner)) x 0.5]] 

Cr = min cr + fraud x (maxcr - miner) x {medcr - miner) 
else 

Cr = max cr - f(l - rand ) x (max cr - min cr) x (max cr - medcr ) 

(ID 

The adaptive mechanism at the end of every generation 
changes the triangular values of 4 C/ based on number 
of success recorded i.e. if the success recorded is greater 
than the success rate. A success is said to be obtained if 
the fitness value of a child or offspring is better than the 
parents, following which the child replaces the parent in 
the next generation. A success rate is fixed at the start 
of generation, which is a fraction of the total number of 
chromosomes or a population set. For each number of 
success recorded, the corresponding 4 C r ’ value is also 
stored in a variable ‘successor’. At the end of each 
generation, the triangular distribution factor for cross 
over is adapted according to the formula as given 
below. 

If success recorded > success rate 

successcrl = sort in ascending (successcr) 
miner = first element of successcrl ; (12) 

medcr = mid element of successcrl ; 
maxcr = end element of successcrl ; 
end 

In the next generation, the triangular distribution factors 
adapted using (12), is used in (11) to compute new 
values of 4 C r ’ for each chromosome. On the other hand 
the distribution factor for ‘/ to compute mutant vector 
using (7) is a fixed one throughout the process. 
Depending on the values of 4 C/ this adaptive 
mechanism selects the following type of cross over as 
given below: 

If C r < 0.5 - Binomial cross over 
If 0.5 < C r < 0.95 - Combines Binomial and Line 
recombination based on a random number 
If C r > 0.95 - Line recombination cross over 
Hence this adaptive mechanism is able to detect non- 
separable problems with high values for 4 C/. Also this 
algorithm is capable to detect a strong dependency on 
decision variable and use a non- rotationally invariant 
line recombination. 

Further, a population refreshment technique is 
incorporated when the difference of interquartile range 
(IQRrange) is below population refreshment rate. The 
interquartile range (IQRrange) is the difference between 
the 75th and 25th percentiles of population. 
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3.23.2 Local search using SQP algorithm 

The sequential quadratic programming (SQP) [36] 

routine is executed in every child of the population 

with a probability Ot LS . Local search strategies 

proved to improve the performance of the classical 
DE algorithm [37]. The probability to apply local 
search in every child is given by the following 
formula. 



100 x no. of Decission variables 

(13) 

The condition to be satisfied to perform local 
search in each children is as given below. 

If rand < a LS 

perform local search using SQP algorithm 
end 

(14) 

where ‘rand’ is any random number between [0 to 
1]. The detailed procedure of the general purpose 
SQP algorithm is explained in section 3.2.5. 



3.2.5 General Purpose SQP Algorithm 

In general a nonlinear constrained optimization problem 

is defined as given below 

min f(x) (15) 

Subject to 

g.(x) = 0 fori = 1,2, m e (16) 

8 j( x ) - 0 for j = m e +1, m (17) 

where V is the vector of length V design parameters, 
f(x ) is the objective function, which returns a scalar 
value, and the vector function g(x) returns a vector of 
length ‘m’ containing the values of the equality and 
inequality constraints evaluated at v. For the problem 
described in (15)-(17) the principal idea is the 
formulation of a Quadratic Programming (QP) sub 
problem based on a quadratic approximation of the 
Lagrangian function. 

L(x,X ) = f(x) - X T g{x) (18) 



3.2.4 General Purpose DE-TCR Algorithm 

The general purpose DE-TCR algorithm is given 

below 

Step 1: Initialize populations with N dimension 
(decision variables) of size NP. Each individual x 
is randomly selected from the searching space 
within the interval [Ip, up ] . 

Step 2: Evaluate initial population 

Step 3: For k=l to Maximum generations or 

convergence criterion reached 

Step 3.1: Apply the fundamental DE operators on 

NP to get the offspring 

1) Perform mutation operation with / selected from 
the triangular distribution. 

2) Perform crossover operator C r selected from the 
triangular distribution. 

Step 3.2: For every child, perform a local search 
with a probability of OC LS 

Step 3.3: Evaluate offspring; if child < parent, the 
parent is substituted by its child. 

Step 3.4: Record the number of success. If the 
cumulative number of success is bigger 

than r success , reset the cumulative number and 

recalculate the triangular distribution for C r . 

Step 3.5: If IQR < V, refresh the population with 

x median ±r ^ x y where 

V = ( X lpper- X i W er) / r va r X upper X lower aK 

the bounds of chromosomes since the last 
population refreshments. / var is the threshold of 

IQR between 0 to 1. 

Step 4: Algorithm terminates. 



and the Jacobian of the constraints be 

A(x) t = Vg(x) T = [Vg l (x),Vg 2 (x),....'Vg m (x) (19) 

Where A(x) T is a n x m matrix 

Then we obtain the QP sub problem by linearizing the 

non linear constraints at (x k , A k ) 



min ^-d T H k d + Vf(x k ) T d (20) 

Subject to 

Vgj(x k ) T d + gj(x k ) = 0, j = 1,2, m 

( 21 ) 

Vg k (x k ) T d + g k (x k )>0, k = m e +l,....m 



Where H is the Hessian of the Lagrangian given by 
V 2 L ( x , 2 ) and d can be found by solving the 
equation given below 

~H(x,JL) -A(x) T 
A{x) 0 

( 22 ) 

The solution for the above sub problem (20) and (21) is 
used form a new iterate as given below 



= [-Vf(x k ) + A(x) T A k ] 



x k + i= x k + a sik + d k 



(23) 
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Hence the general procedure to solve a non linear- 
constrained optimization problem using SQP 
involves mainly three steps as given below 
The general purpose SQP algorithm is given below: 

Step 1 : Choose the initial point (x o , A 0 ) 

Step 2: Initialize the Hessian estimate, say 

H n =I 

Step 3: Evaluate f 0 , g a and A o 



Each decision variable should be multiplied by decimal 
number ‘15’ since it is a 4 unit system (the maximum 4 
bit binary: ‘1111’ - equivalent decimal representation 
is: ‘15’). The interpretation of the ON/OFF status of 
each unit from the intelligent chromosome is as shown 
Table 1. 

In the same way an initial chromosome pattern of size 
500 is created, from which a set of 500 ON/OFF status 
solutions for 4 units can be obtained. 



Step 4: Begin the major iteration loop in k 
Step 4.1 If termination criteria are met, then stop. 

Step 4.2 Compute d k by solving equation (22) 

Step 4.3 Using d k solve the QP sub problem (20) 
and (21) 

Step 4.4 Using the results of step 4.3 find 

X k + X = X k + a k d k 

Step 4.5 Evaluate f k+l ,g k+l andA k+1 
Step 4.6 Compute 

^k + 1 = _ [^Jfc+1 A k + \ ] A k+l^f (** k+1 ) 

Step 4.7 Set 



4.2 Evaluating Fitness function ofUC problem 
The fitness function is given in (1). It involves two 
modules namely (i) Fuel cost (EFD) problem (ii) Start 
up cost. In this section the startup cost is optimized 
satisfying constraints from (2) to (6) using DE-TCR. 
Therefore, for the ON/OFF status solution obtained 
from each chromosome the constraints from (2) to (6) 
are verified. The infeasible solutions are penalized with 
a high value of fitness say lx 10 10 °. For the feasible 
solutions the fitness is calculated using module (ii) i.e. 

NT NG 

£ £ sc it (l-t/ it/,., 

t=\ i = 1 



Sk~ a kdk> 3/ — ^ L(x k+l ->\+\) ^xL( x k’^k+ 1 ) 



Step 4.8 Obtain H k+1 by updating H k using any 

quasi-Newton formula. 

Step 5 End major iteration loop. 

4. Implementation of DE-TCR and 
SQP for UC-ELD Problem 

The DE-TCR algorithm determines the solution for 
UC problem and SQP determines the solution for 
EFD problem. Before formally presenting DE-TCR 
algorithm applied to UC-EFD problem, sections 4.1 
to 4.8 demonstrates the various steps involved in 
DE-TCR and SQP algorithms with numerical 
examples. Section 4.9 presents DE-TCR algorithm 
applied to UC problem. This section 4.0 provides 
the step by step implementation of DE-TCR and 
SQP algorithm for 4 units, 8 hour UC-ELD 
problem. The data for the 4 units, 8 hour UC-ELD 
system is given in Table 1. Let the population size 
be 500, and the total number of generations is 1000. 

4.1 Initial Generation of population for UC 
problem 

Here the total number of generators, NG =4. 
Further, the total time horizon, NT =8 hrs which is 
also the length of chromosome. Say for example the 
initial representation of intelligent chromosome is 
as shown in Figure 3. 



Table 1: Interpretation of Unit Schedule form intelligent 
chromosome for 4 unit, 8 hour system 



Decision 

variable 

value 

(D) 


Dx 

decimal 

number 

(15) 

(Dl) 


Unit Status representation (convert Dl 
into Binary equivalent) 


Unit 1 


Unit 2 


Unit 3 


Unit 4 
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1 


0 


0 


0.905 


13.575 


1 


1 


0 


1 
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1.9050 


0 


0 
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1 


0.913 


13.695 


1 


1 


0 


1 


0.632 


9.480 


1 


0 


0 


1 


0.097 


1.455 


0 


0 


0 


1 


0.278 


4.170 


0 


1 


0 


0 


0.546 


8.190 


1 


0 


0 


0 



4.3 DE-TCR operators on UC problem 
There are mainly two operators in DE-TCR namely 
mutation and adaptive crossover as explained below 

4.3.1 Mutation 

The mutant vector is calculated using (7). Three 
different 4 f ’ values namely minimum 4 f ’ (minf), 

median 4 / ’ (medf) and maximum 4 f ’ (maxf) is used. 
The typical values of triangular distribution factor of 
4 f ’ are minf =0.3, medf =0.4, and maxf =0.5. The value 

of 4 / ’ for each chromosome is selected based on the 
procedure given below 



1 hr 


2 hr 
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8 hr 


0.814 
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0.632 


0.097 


0.278 
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If rand< [(medf - minf)x (2/(maxf- minf))x 0.5] 

/ = min / + yjrand x (max/ - min/) x (medf -minf) (^4) 
else 



Figure 3: Example representation of Decision 
Variable in 4 units, 8 hour system 



/ = max/ - yl( 1 - rand) x (max/ - min/) x (max/ - medf) 
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where 'rand' is any random number between [0 to 
1]. For the range of distribution factor [0.3 0.4 0.5], 
the value of [(medf - minf) x (2/(maxf - minf)) x 0.5]] 
in equation (24) is 0.5. For a chromosome, with a 
random number 0.3341 (say), to calculate the 
mutant vector using (7) the value of f\ using 
equation (24) is 0.3817. Hence depending on the 
random number generated, each chromosome will 
have its own value of f\ But the triangular 
distribution factor for f’ ’is fixed and doesn’t 
change throughout the completion of complete 
iterations (generations). 

4.3.2 Adaptive Crossover mechanism 
Let the initial triangular distribution factor for 
crossover be [0.2 0.5 1] (say). For this range, the 
value of [(medcr- miner) x(2/(maxcr- miner)) x 0.5]] 
in equation (11) is 0.6. For a chromosome, with a 
random number 0.1472 (say), the value of ‘C/ 
using (11) is 0.3880. Since the value of 4 C r ’ is less 
than 0.5 a binomial cross over is selected for that 
particular chromosome. In the same way for the 
complete chromosome set, the values of ‘C r ’ is 
calculated and the corresponding cross over is 
selected. Before the start of next generation, the 
new values of triangular distribution factor for cross 
over are adapted using equation (12). 

4.4 Local search using SQP for UC problem 
Using section 4.3.1 and 4.3.2 the new off springs 
(children) are created. Once when the new children 
are formed then to perform a local search, using 

(13) a LS is calculated. To validate equation (14), 
for ran d < a LS a local search is performed. 



4.5 Evaluating the offsprings 

For the new off springs created in section 4.4 the 

fitness function i.e., SC . t (1 - U . (? _ 1} )U ■ t is 

evaluated. 



4.6 Population refreshment for UC problem 
In this module, a fraction of population is 
reinitialized with a new search space bounds to 
obtain population refreshment. Let us consider the 

value of the threshold of IRQ, / var = 2 (say). For 

the complete population or chromosome set the 
interquartile range (IQRrange) is found i.e. the 
difference between the 75 th and 25 th percentiles of 
population. This value is compared as shown below 



If IQR < 


* upper A 
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y Y var 
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+ ar 
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v lower 



( 25 ) 



* * 

Where X upper and x lower are the bounds of 
chromosomes since the last population refreshments. 
x iower ’ x medmn , x upper be the lower, median and upper 

bounds of chromosomes at current generation. 

After the above process, the actual bounds of the 
decision variable is checked, if violated the values of 
violating bound is reinitialized. 

4.7 Check for Convergence for UC problem 

Once when the new populations are generated the 
fitness value is evaluated and the convergence criteria 
i.e. maximum generations of 1000 are reached. If this 
condition fails then for the current population the DE- 
TCR operators are applied and the iterative process 
continues until the convergence criteria is met. 

4.8 SQP for ELD problem 

Once when the complete process of DE-TCR is over, 
then the final population gives the best optimal unit 
schedule. For this schedule the SQP algorithm as 
explained in section 3.2.5 is performed to minimize the 

NG 

objective ^ FC i U i hence the optimal generation 

f=i 

schedule for the given optimal unit schedule is obtained. 

4.9 Algorithm and flow chart for UC-ELD using DE- 
TCR and SQP problem 

The algorithm for UC-DE-TCR is given below: 

Stepl: Read the population size, size and length of 
decision variables along with unit commitment data etc. 
Step 2: Initialize population within the search space. 

Step 3: Increase the counter for number of generations. 
Step4: For each hour, from the information obtained 
from every chromosome, make the corresponding 
generators ON and OFF respectively and obtain a set of 
solutions for UC equal to the size of population. 

Step 5: Check for feasibility of solution and UC 
constraints. For the set of chromosomes satisfying the 
constraints and meeting solution feasibility compute the 

NT NG 

fitness using The 

t = 1 i=\ 

remaining chromosomes are penalized with high fitness. 
Step 6: Perform cross over and mutation operation from 
triangular distribution factors. 

Step 7: Perform a local search using SQP for the every 
child or offspring. Evaluate fitness using (1), for each 
individual. Ensure elitism by comparing the fitness of 
child and parent and replace accordingly. 

Step 8: Record the number of success and reset the 
cumulative number and recalculate the triangular 
distribution factor for ‘C r ’. Check for population 
refreshment. 

Step 9: Check for convergence or maximum generations 
reached. If YES then GO TO Step 10 else GO TO Step 
3. Step 10: For the optimal unit schedule obtained in 
step 9, run SQP algorithm as explained in sec 3.2.5 to 
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minimize for ELD problem and hence 

/= i 

obtain the optimal generation schedule. 

Step 11: Print the optimal values of decision 
variables and STOP. 

The flow chart for the UC-ELD problem using DE- 
TCR and SQP is as shown in Figure 4. 

5. Numerical Results and Discussion 

The application of DE-TCR and SQP for UC-ELD 
is presented and the results are compared against 
the known reported methods. DE-TCR, SQP, UC 
and ELD programs are executed in MATLAB 
R2009b. Two test cases are considered. Case 1: 
Standard 4 units, 8 hour system - Table 2, Case 2: 
Standard 10 units, 24 hour system - Table 5. The 
parameter value in DE-TCR algorithm is 
population: 500, maximum generations: 1000, 

Triangular distribution for the differentiation factor 
for T and ‘C/ are [0.3 0.4 0.5] and [0.2 0.5 1.0] 
respectively. For SQP the default settings of 
MATLAB R2009b are considered. 

Case 1: 4 Units, 8 hour Test system 
The data for the case 1 test system is given in Table 
2. DE-TCR and SQP is applied for this system and 
the results obtained are shown in Table 3. It is 
evident from Table 3. that the optimal production 
cost for complete 8 hour demand is 73796.7 $ and 
the optimal Start up cost is 520.02 $. Therefore the 
optimal total cost for the complete schedule using 
DE-TCR and SQP is 74316.72$. The comparison of 
result obtained using the proposed method against 
the other reported methods is shown in Table 4. 
From Table 4. it is evident that the proposed 
method finds better optimal value when compared 
to other methods. 

Case 2: 10 Units, 24 hour Test system 

The data for the case 2 test system is given in Table 

5. DE-TCR and SQP is applied for this system and 
the results obtained are shown in Table 6. It is 
evident from Table 6. that the optimal production 
cost for complete 24 hour demand is 555508.9 $ 
and the optimal Start up cost is 8240.0 $. Therefore 
the optimal total cost for the complete schedule 
using DE-TCR and SQP is 563748.9$. The 
comparison of result obtained using the proposed 
method against the other reported methods is shown 
in Table 7. The comparison of result obtained using 
the proposed method against the other reported 
methods is shown in Table 7. From Table 7, it is 
evident that the proposed method finds better 
optimal value when compared to other methods. 

6. Conclusions 

This paper presents a novel method namely DE- 
TCR and SQP for UC-ELD problem. The salient 
features of the proposed DE-TCR are: 

(1) Adaptive Crossover in DE, which is able to 
interlink the cross over probability in conjunction 




Figure 4: Flow chart for UC-ELD using DE-TCR and 
SQP algorithm 



With the non- separable and decision variable 
dependency of UC problems. 

(2) Local search in each chromosome using SQP, which 
has proved in improving the performance of the 
classical DE algorithm and 

(3) Intelligent chromosome representation, which is 
capable of searching a large space for the purpose of 
obtaining global optimal solution. The proposed 
algorithm is tested with standard 4 units, 8 hour and 10 
units, 24 hour test systems. Results indicate that the 
proposed method finds better optimal solution when 
compared to the conventional known methods reported 
in the literature. 
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Table 2: Case 1: Test system Data 



Units 


P 

A min 

(MW) 


p 

1 max 

(MW) 


Cold 
Cost ($) 


MUT 

(Hrs) 


MDT 

(Hrs) 


Initial State 
(Hrs) 


Hot cost 
($> 


Cold Start 
(hrs) 


Cost Coefficients 


a 


B 


C 


1 


25 


80 


350 


4 


2 


-5 


150 


4 


213 


20.74 


0.0018 


2 


60 


250 


400 


5 


3 


8 


170 


5 


585.62 


16.95 


0.0042 


3 


75 


300 


1100 


5 


4 


8 


500 


5 


648.74 


16.8 


0.0021 


4 


20 


60 


0.02 


1 


1 


-6 


0 


0 


252 


23.6 


0.0034 


Hours 


1 


2 


3 


4 


5 


6 


7 


8 








Demand 


450 


550 


600 


540 


400 


280 


290 


500 









Table 3: Case 1: UC-ELD solution using DE-TCR and SQP 



Unit/ 

Hours 


Load 

(MW) 


G1 (MW) 


G2 (MW) 


G3 (MW) 


G4 (MW) 


Production 
Cost ($) 


Startup 
cost ($) 


Cummulative Total 
Cost ($) 


1 


450 


0 


150 


300 


0 


9208.4 


0 


9208.4 


2 


530 


0 


230 


300 


0 


10648.4 


0 


19856.8 


3 


600 


50 


250 


300 


0 


12265.4 


350 


32472.2 


4 


540 


25 


215 


300 


0 


11113.4 


0 


43585.6 


5 


400 


80 


0 


300 


20 


8534.1 


0.02 


52119.72 


6 


280 


25 


0 


255 


0 


5872 


0 


57991.72 


7 


290 


25 


0 


265 


0 


6046.6 


0 


64038.32 


8 


500 


0 


200 


300 


0 


10108.4 


170 


74316.72 


Total Cost 


73796.7 


520.02 


74316.72 



Table 4: Case 1: Comparison of results with other methods 



Solution Methods 


Total Cost ($) 


LR [38] 


74808 


LR-PSO [38] 


75231.9 


FL [38] 


74683.6 


ACO [38] 


74520.34 


DE-TCR and SQP (Proposed) 


74316.72 



Table 5: Case 2: Test system data 



Units 


P . 

1 min 

(MW) 


P 

1 max 

(MW) 


Cold 
Cost ($) 


MUT 

(Hrs) 


MDT 

(Hrs) 


Initial 

State 

(Hrs) 


Hot cost 
($) 


Cold 

Start 

(hrs) 


Cost Coefficients 


a 


b 


C 


1 


150 


455 


9000 


8 


8 


8 


4500 


5 


1000 


16.19 


0.00048 


2 


150 


455 


10000 


8 


8 


8 


5000 


5 


970 


17.26 


0.00031 


3 


20 


130 


1100 


5 


5 


-5 


550 


4 


700 


16.6 


0.002 


4 


20 


130 


1120 


5 


5 


-5 


560 


4 


680 


16.5 


0.00211 


5 


25 


162 


1800 


6 


6 


-6 


900 


4 


450 


19.7 


0.00398 


6 


20 


80 


340 


3 


3 


-3 


170 


2 


370 


22.26 


0.00712 


7 


25 


85 


520 


3 


3 


-3 


260 


2 


480 


27.74 


0.00079 


8 


10 


55 


60 


1 


1 


-1 


30 


0 


660 


25.92 


0.00413 


9 


10 


55 


60 


1 


1 


-1 


30 


0 


665 


27.27 


0.00222 


10 


10 


55 


60 


1 


1 


-1 


30 


0 


670 


27.79 


0.00173 


























Hours 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


Demand 


700 


750 


850 


950 


1000 


1100 


1150 


1200 


1300 


1400 


1450 


Hours 


12 


13 


14 


15 


16 


17 


18 


19 


20 


21 


22 


Demand 


1500 


1400 


1300 


1200 


1050 


1000 


1100 


1200 


1400 


1300 


1100 


Hours 


23 


24 




















Demand 


900 


800 
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Table 6: Case 2: UC-ELD solution using DE-TCR and SQP 



Unit 

/ 

Hrs 


Load 


G1 


G2 


G3 


G4 


G5 


G6 


G7 


G8 


G9 


G10 


Production 


Startup 


Cummulative 


(MW) 


(MW) 


(MW) 


(MW) 


(MW) 


(MW) 


(MW) 


(MW) 


(MW) 


(MW) 


(MW) 


Cost 


cost 


Total Cost 


1 


700 


455 


245 


0 


0 


0 


0 


0 


0 


0 


0 


13683.1 


0 


13683.1 


2 


750 


455 


295 


0 


0 


0 


0 


0 


0 


0 


0 


14554.5 


0 


28237.6 


3 


850 


455 


395 


0 


0 


0 


0 


0 


0 


0 


0 


16301.9 


0 


44539.5 


4 


950 


455 


455 


0 


0 


0 


0 


0 


30 


0 


10 


19742.7 


120 


64402.2 


5 


1000 


455 


405 


130 


0 


0 


0 


0 


0 


0 


10 


20316.8 


1100 


85819 


6 


1100 


455 


455 


130 


0 


0 


60 


0 


0 


0 


0 


21976.3 


340 


108135.3 


7 


1150 


455 


425 


130 


130 


0 


0 


0 


0 


10 


0 


23517.7 


1180 


132833 


8 


1200 


455 


440 


130 


130 


0 


20 


25 


0 


0 


0 


24834.7 


690 


158357.7 


9 


1300 


455 


455 


130 


130 


105 


0 


25 


0 


0 


0 


26842.1 


1800 


186999.8 


10 


1400 


455 


455 


130 


130 


162 


43 


25 


0 


0 


0 


29365.9 


170 


216535.7 


11 


1450 


455 


455 


130 


130 


162 


80 


38 


0 


0 


0 


30583.2 


0 


247118.9 


12 


1500 


455 


455 


130 


130 


162 


80 


70 


0 


0 


17.7 


32644.4 


60 


279823.3 


13 


1400 


455 


455 


130 


130 


162 


48 


0 


0 


10 


10 


30192.5 


60 


310075.8 


14 


1300 


455 


455 


130 


130 


120 


0 


0 


0 


10 


0 


26915 


0 


336990.8 


15 


1200 


455 


455 


0 


130 


125 


0 


25 


10 


0 


0 


25282.3 


320 


362593.1 


16 


1050 


455 


455 


0 


130 


0 


0 


0 


0 


10 


0 


21151.9 


60 


383805 


17 


1000 


455 


380 


0 


130 


0 


0 


25 


10 


0 


0 


20933.7 


320 


405058.7 


18 


1100 


455 


455 


0 


130 


60 


0 


0 


0 


0 


0 


21860.3 


900 


427819 


19 


1200 


455 


440 


130 


130 


25 


20 


0 


0 


0 


0 


24605.7 


890 


453314.7 


20 


1400 


455 


455 


130 


130 


162 


68 


0 


0 


0 


0 


28762.2 


0 


482076.9 


21 


1300 


455 


455 


130 


130 


130 


0 


0 


0 


0 


0 


26184 


0 


508260.9 


22 


1100 


455 


455 


0 


130 


0 


50 


0 


0 


10 


0 


22652.7 


230 


531143.6 


23 


900 


455 


455 


0 


0 


0 


0 


0 


0 


0 


0 


17177.9 


0 


548321.5 


24 


800 


455 


345 


0 


0 


0 


0 


0 


0 


0 


0 


15427.4 


0 


563748.9 


Total Cost 


555508.9 


8240 


563748.9 



Table 7: Case 2: Comparison of results with other methods 



Solution Methods 


Total Cost ($) 


LR [23] 


565,825 


GA[23] 


565,825 


EP [23] 


564,551 


SA [23] 


565828 


IPSO [23] 


563954 


NBACO [23] 


563977 


ICGA [29] 


566404 


BF [29] 


564842.0 


HPSO [29] 


563942.3 


ICA [29] 


563937.7 


DE-TCR and SQP (Proposed) 


563748.9 



Nomenclature 


p,< 


Amount real power generated in the i th unit 






SC U 


Generation startup cost of i th unit at time t 


AT 


Total schedule period in hours 


u u 


Unit ON/OFF status of i th unit (=1, if the 


NG 


Number of generating units in the plant 




unit is ON; =0, if the unit is OFF) 


i 


Index of units (i=l,2,3, ) 


PD, 


Real power demand at given time t 


t 


Index of time (t=l,2, NT) 


rji on 
1 i,t 


Total time for which the i th unit is turned ON at 


FC i>t 


Fuel cost of the i th unit at time t 




time t. 
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T i t off Total time for which the i th unit is turned 
OFF at time t 

MUTi Minimum up time for the i th unit 

MDTi Minimum down time for the i th unit 

ai,bi,Ci Fuel cost coefficients 
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Abstract 

This paper focuses on a proposal for e-assessment, 
its goal is to propose an alternative solution to the 
problem of school failure, attending to diversity 
from the perspective of the evaluation. The 
proposal for e-assessment relies an ontological 
model for personalizing assessment activities. In 
this paper we presented the design of three 
ontologies that can represent the thinking style of 
students, the assessment activities and the learning 
path, this ontologies will allow to make a 
recommendation of different type of activities to 
develop the students skills necessary skills to cover 
course objectives. The model is the result of 
fieldwork conducted with engineering students 
from the Universidad Autonoma Metropolitana- 
Atzcapotzalco. 

Keywords: assessment, research, technology, 

learning design, ontological model. 

1. Introduction 

The evident diversity of the academic community, 
presents a challenge for the new educational 
paradigm, requires the adoption of a model that 
allows access to knowledge and learning to all 
students, which implies to recognize the differences 
between individuals. This paper presents an 
ontological model whose purpose is to recommend 
the personalized assessment activities for 
engineering students of Universidad Autonoma 
Metropolitana Azcapotzalco, according to their 
learning profile in a virtual learning environment. 



efficiency. However, this problem is not restricted to 
Mexico, and can be considered as an internationally 
relevant problem, as can be seen in the high number of 
articles, publications and books that analyze it [1]. 
Thus, we believe that it is necessary to propose new 
alternatives to experiment and to find better solutions to 
the previously mentioned problems. 

In this sense, we consider that a new model must take 
advantage of the new communication and information 
technologies to overcome educational problems. 
Besides, we include ontological models to add 
intelligence to the selection mechanisms of e- 
assessment in order to attend student diversity. In this 
paper we propose an ontological model that designs 
personalized assessment activities and have an e- 
assessment, based on field work undertaken during two 
years, with the goal of improving school efficiency. 

The Ontological Model for Personalized Assessment 
Activities (OMPAA) integrates three ontologies to 
represent the thinking style of students, the assessment 
activities and the learning path. From the skills that 
students must develop to complete the course, the 
ontological model sets the path to personalized 
assessment activities. In this paper, we consider 
personalization as an individuation process and subject 
realization. The personalization as psychoanalytic 
theories implies differentiation and individuation [2]. 

The ontologies representation is done with the Web 
Ontology Language (OWL) [3], latest standard 
language proposed by the W3C for representing 
ontologies on the Web, and its implementation is done 
with the tool Protege [4] . 



The educational models in Mexico do not respond 
to the problem of school failure, that includes 
academic lag, reprobation and low terminal 



A detailed breakdown of the structure of the rest of the 
paper is as follows. Section 2 presents the assessment 
concept. Section three contains the assessment activities 
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and their relationship with the thinking style. The 
fourth section presents the review about ontologies 
related with education. The fifth section presents 
the proposed ontological model. The sixth section 
contains the experimental work and results. Finally 
the conclusion are presented in section seven. 

2. The assessment 

The term "assessment" is often used synonymously 
with proof, exams, test. The assessment is a process 
or a set of activities programmed of reflection about 
the action, supported by systematic procedures for 
collecting, analyzing and interpreting information 
in order to make informed and communicable 
judgments, to make recommendations, to take 
decisions, to review actions, to present and to 
improve future actions [5]. 

Assessment is a process that integrates multiple 
activities that allow students to practice concepts, to 
understand and to associate them with their prior 
knowledge. At the same time, the assessment is an 
indicator that allows the teacher to give feedback to 
the student, and to adjust the strategy being 
implemented in the student personalized learning 
activity. 



For purposes of this work we selected the total brain 
theory of Herrmann Ned, his model integrates the 
neocortex (right and left hemispheres) with limbic 
system, obtaining four quadrants of this intersection, 
this quadrants determine different styles of information 
processing in individuals [11]. Based on this theory, we 
classified assessment activities that can be programmed 
in a course in four sets, as shown in Figure 1. 

Personalized Learning Activities 
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Figure 1 . Assessment activities and whole brain theory of Herrmann 
Ned 



3. Assessment activities and their 
relationship with the thinking style 

Metacognition can be defined as the awareness 
degree or knowledge that individual possesses 
about their thinking forms, considering cognitive 
processes and events; the structures and the ability 
to control these processes (organize it, review it and 
modify it) based on learning outcomes [6,7,8]. 

To attend the thinking style, Cognitive 
Neuroscience theories were evaluated [9,10,11]. 
Sperry's research confirms the specialization of the 
cerebral hemispheres. The right hemisphere is 
specialized in simultaneous process, integrates and 
organizes information globally, focuses on 
relationships, is holistic [10]. While Verlee suggests 
that the difference between the two brain 
hemispheres, is its style of processing information 
[12]. MacLean presents a different view of how the 
brain works, complementing Sperry investigations. 
MacLean' s conceptualization suggests that the brain 
consists of three interconnected components: the 
reptilian brain, the limbic system and the neocortex, 
responsible for controlling human behavior [9]. 
Ned Herrmann, from studies of Sperry cerebral 
dominance and the MacLean's triune brain theory, 
as well as on the results of their own investigations, 
using biofeedback equipment (bio-feedback) and 
electroencephalography, rethinks the problem of 
cerebral dominance (Ruiz Bolivar and Cols., 1994) 
cited in [13]. 



The whole brain theory divides the brain into four 
quadrants. Quadrant A (Logical) focuses on responding 
what?, it’s based on facts, it’s analytical, logical, 
rational; Quadrant B (Process) seeks explanation of 
how?, it’s organized, sequential, conservative and 
orderly; Quadrant C (Relational) responds to who?, it’s 
interpersonal, emotional, intuitive, its interrelation with 
others is critical, so quadrant C prefers activities that are 
related with others, and finally the Quadrant D 
(Creative), answers the why? of things, it’s inclusive, 
visionary, risk-taker, it’s excited to create and invent. 
The A and B quadrants are related to the left 
hemisphere of the brain, while the C and D quadrants 
are associated with the right hemisphere. 

Classifying the various assessment activities according 
to the characteristics of each quadrant, is the first step to 
customize the assessment to attend the diversity of 
students' thinking. For quadrant A we recommend 
assessment tools such as desk research, case analysis, 
written examinations, among others. While for quadrant 
B is recommendable to use concept maps, summary 
tables, tests, among others. For quadrant C we propose 
evaluation activities carried out on computer, for 
example the use of social networks is highly 
recommended, videoconferencing, discussion, 
prototype design and its representation. Finally, for 
quadrant D is recommendable to apply fieldwork, 
problem solving from knowledge acquired, activities 
that implement what they have learned as prototyping, 
project development, among others. 
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The classification of personalized assessment 
activities using the student's thinking style is an 
easy task, however, is a complex task to perform 
manually by the teacher, therefore, we propose an 
ontological model that automates this activity and 
recommends to the teacher a set of activities that 
should be scheduled to. 

4. Related work 

This section is divided in 2 subsections, the first 
part contains a review of a set of ontology 
repositories and the second describes the tools used 
to perform ortologies in this work. 

4.1. Related ontologies 

In order to validate the use of ontologies or existing 
metadata, we review the existing ontologies. We 
validated the following ontology repositories: 

1 . http://www.ksl.stanford.edu/software/ontolingua 

2. http://www.daml.org/ontologies/ 

3 . http://www.dmoz.org/Reference/Education/ 

4. http://www.unspsc.org 

30,773 related ontologies with the knowledge 
domain of education were located, of which: a) 663 
related to online learning, b) 58 related to educators 
and c) 1,246 focused on the administrative or 
management institution. However some related 
ontologies with learning profile and personalized 
assessment through a learning path was not found. 
We listed below a set of ontologies closely related 
to educational field: 

• Ontology 

http://www.es .umd.edu/proj ects/plus/D AML/o 
nts/univl.O.daml 

This ontology describes the activities that take 
place within a university, considers the entire 
management and research environment. In the 
part of teaching is limited to courses offered, 
however it is not specialize in assessment and 
learning profiles. 

It can be accessed from: 

http : //www . daml . org/ontologies/63 . 

The link to access the technical information is: 
http://www.es .umd.edu/proj ects/plus/D AML/o 
nts/univl.O.daml. 

• Ontology 

http ://www .ksl. stanford.edu/proj ects/D AML/k 
sl-daml-instances.daml 

This ontology focuses on people, projects, 
articles and academic organization. It does not 
properly consider the issues associated with 
assessment and learning profiles. 

It can be accessed from: 

http ://www .ksl. stanford.edu/proj ects/D AML/k 

sl-daml-instance s . daml . 

The link to read the technical description of it 
is: 



http://www.ksl.stanford.edu/projects/DAML/ksl- 

daml-descr.daml. 

• Ontology 

http://www.ksl.stanford.edu/projects/DAML/ksl- 

daml-desc.daml 

This ontology has as topics of interest projects, 
articles, research programs. 
http://www.ksl.stanford.edu/projects/DAML/ksl- 
daml-ins tances . daml . 

To access the technical information of ontology 
we refer the reader to the link: 
http://www.ksl.stanford.edu/projects/DAML/ksl- 
daml-desc.daml. 

• Ontology 

http ://w w w . aktors . org/ontology/portal 
This ontology describes a computer science 
academy scene, considering departments, people, 
projects, publications, research. 

To access the technical information we refer the 
reader to the link: 

http : //www . aktors . org/ontology/portal . 

Hayashi, Bourdeau and Mizoguchi (2009) presented a 
model called OMNIBUS ontology, which organizes 
learning and instructional theories to construct learning 
scenarios [14]. Although this ontology is focused in the 
construction of learning scenarios with a selection of 
appropriate learning objects to different learning states, 
does not consider three fundamental points that must be 
present during the individual learning: 1) motivation, 2) 
individual learning preferences and 3) cognitive skills to 
be developed during their learning experience. 

After an extensive revision, we concluded that there are 
not ontologies directly related with the domain 
knowledge of the learning profile and the personalized 
assessment. 

4.2. Tools used to build ontologies 

To perform the ontologies described in this work, we 
used Web Ontology Language (OWL) which is an 
intended to provide a language that can be used to 
describe classes and relations between them that are 
inherent in Web documents and applications [3]. OWL 
can be used to 1) Lormalize a domain by defining 
classes and properties of those classes, and 2) Define 
individuals and assert properties about them. 

The OWL language provides three increasingly 
expressive sublanguages designed to be used by specific 
communities of implementers and users. A) OWL Lite 
supports those users who primarily need a classification 
hierarchy and simple constraint features. B) OWL DL 
supports those users who want the maximum 
expressiveness without losing computational 
completeness. C) OWL Full is meant for users who 
want maximum expressiveness and the syntactic 
freedom of RDL with no computational guarantees. 
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Each of these sublanguages is an extension of its 
simpler predecessor [3]. 

Ontologies in this work were built using Protege 
which is a free, open-source ontology editor and 
framework for building intelligent systems. 
Protege is supported by a strong community of 
academic, government, and corporate users, who 
use Protege to build knowledge-based solutions in 
areas as diverse as biomedicine, e-commerce, and 
organizational modeling. Protege’s plug-in 
architecture can be adapted to build both simple and 
complex ontology-based applications. Developers 
can integrate the output of Protege with rule 
systems or other problem solvers to construct a 
wide range of intelligent systems [4] . 

5. The proposed ontological model 

To achieve the strategic learning in education 
mediated by ICT is necessary to consider aspects 
such as diversity and evaluation, which are key 
factors in strategic learning. 

The work developed by Silva [15] proposed that 
the goal of the architecture of Strategic Learning 
Meta-Model (SLM) is to improve student 
performance, to make strategic learners, self- 
regulated and self-reflective, encouraging learning 
through an educational environment that integrates 
a psycho-pedagogical model, an ontological model 
and emerging technologies that enable ubiquity. 
The strategic learning meta-model provides an 
architecture consisting of three layers: the reactive 
layer, intelligent layer and the infrastructure layer 
[15]. 

The ontological model is the intelligence of the 
system, and is composed of five ontologies, which 
are: courses, assessment activities, profiles, students 
and learning path. 

Creating ontologies is one of the artificial 
intelligence models that organize knowledge in 
standard form, with the purpose of categorize the 
information in such a way that it can be processed 
by computers. 

The term ontology is adopted by artificial 
intelligence as a mechanism to share and reuse 
knowledge. According to Guarino [16,17] an 
ontology is an artefact, consisting of a specific 
vocabulary that describes a knowledge domain, 
integrating a set of rules that explain the 
vocabulary. While McGuinness defined an 
ontology as the formal explicit description of 
concepts in a domain, including its properties and 
existing constraints [18]. 

Although there are many different definitions of 



ontology, one of the most accepted is of Thomas 
Gruber: "a formal explicit specification of a shared 
conceptualization" [19]. 

For each of the developed ontologies was necessary to 
identify classes and key concepts, define the class 
hierarchy, and then is necessary to describe the 
properties and relationships between classes. It must be 
considered that all subclasses of a class inherit the 
properties of that class, so that a property must be 
defined in most general class possible. In figure 2 the 
properties and relationships of each class of the 5 
ontologies are presented. 



Ontological Model 




Figure 2. Relationships between classes of ontologies. 

The set of ontologies are used to personalize the student 
learning path from her/his learning profile. An 
applicative case was defined in a programming 
structured course with engineering students at the 
Universidad Autonoma Metropolitana-Azcapotzalco 
(UAM-A). 

The Ontological Model for Personalized Assessment 
Activities presented here provides the description of this 
3 ontologies: learning path, student and the assessment 
activities. The design of these three ontologies were 
based of Noy & McGuiness [20] methodology, and 
OntoDesign Graphics for graphical representation. 
OntoDesign Graphics is used to standardize the 
graphical design of an ontology. In this paper we 
present the design of the three previously mentioned 
ontologies using the Protege-OWL editor and in OWL 
format. 

5.1. Learning path ontology 

Learning path ontology is constituted by two classes: 
InstructionalPlan and LearningPath and as shown in 
figure 3. These classes are populated by inference rules 
that evaluate the information associated with the student 
profile and the skills that students must develop to 
complete the course. Inference rules build the student’s 
personalized learning path, considering assessment 
activities that motivate students as they are associated 
with your learning profile and assessment activities that 
are necessary for students to develop desirable skills in 
the course. 





Artificial Intelligence and Machine Learning Journal, ISSN 1687-4846, Volume 14, Issue 1, Delaware-USA, August 2014 




5.2. Student learning profile ontology 

The ontology of student learning profile integrates 
2 classes: Student and Learning Profile, as shown in 
figure 4. The ontology model of learning profile 
allows to classify the student, with the purpose of 
customize teaching and attend the thinking and 
learning diversity [21]. 

5.3. Assessment activities ontology 
(personalization of learning activities) 

The ontology of assessment activities model 
integrates three classes: Activity, CogntiveSkill and 
Tool (as shown in figure 5). The Activity class 
includes several assessment activities, each activity 



has weights associated to the cognitive skills that can be 
developed in students and register in CognitiveSkill 
class. The Tool class stores technology tools that can be 
used for carrying out assessment activities. 

6. Experimental work and results 

The case study was defined on two courses for 
engineering students of the UAM-A. The Structured 
programming course and the numerical methods in 
engineering course. Experiments were conducted for 2 
years, with a total of 7 experiments in which 
personalized learning activities according to the theory 
of total brain, as indicated in figure 1 . 




6.1. Methodology 

The methodology involves the application of the 
student test that determines his/her style of thinking. 
It sets the course and develop the skills required by 



the student. The student's thinking style is used to create 
the personalized learning activities. 

Upon completion of the course the student is assessed 
in personalized interviews, as well as check their 
progress through the thinking style questionnaire. 
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Finally, we propose to systematize the experiment 
through an ontological model design that allows to 
generalized the personalization of learning 
activities in different courses. 

6.2. Results 

The results of the test applied to the students in 
each trimester in a structured programming (SP) 
course or in a numerical methods in engineering 
(NME) course are shown in figure 6 and 7 
respectively. It can be observed that there was an 
increase in all the average rating in all thinking 
styles. 



and to attend student needs, keeping in mind that 
everything is immersed in a very specific context and 
that changes must be adapted to this context. These 
interventions allow the student to have better 
opportunities for intellectual and moral development, as 
established by Piaget [23] and Vygotsky [24,25]. 

The personalization of the learning path: 1) allows 
students to develop the desired competencies as part of 
the course objectives, 2) it is a mechanism of self- 
motivation, when the student feels that he/she is treated 
differently his/her attitude changes throughout the 
course. 




Figure 6. Average rating in the thinking style of SP-course. 




Figure 7. Average rating in the thinking style of NME-course. 



The figure 8 shows the assessment activities in 
structured programming course in SAKAI portal. 
The activities are selected according with the 
learning profile. SAKAI [22] is Learning 
Management System (LMS) open source where 
were implemented both courses. 



7. Conclusions 

The personalization of the assessment activities is a 
complex task, that involves three variables: the 
student's thinking style, course objectives and units 
objectives, as well as the evaluation technique. 
However, through the ontology set the 
personalization of assessments activities encourages 
educational innovation, invites to invent, to 
reconstruct at each stage the educational practices 



The proposed ontological model potency the assessment 
as a mechanism not only for change, but also for 
strategic learning. Assessment becomes the main engine 
of a new learning culture, enabling them to continue 
learning throughout life. 

The results of fieldwork provide encouraging results 
presented in figures 6 y 7. The student dominance 
average was increased more in the logic and creative 
styles. 

This developed ontologies involve teaching 
reengineering, whose success depends on the paradigm 
change, in the professional work of teachers, since they 
are the principal assistants in student learning. 
Innovation, creativity, invention, to search for 
alternatives is the new work of every teacher immersed 
in education mediated by new information and 
communication technologies. 
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Abstract 

The aim of the present study is to investigate and explore 
the potential of Takagi-Sugeno fuzzy inference system 
for seismic signal discrimination between quarry blast 
and earthquake events. Such fuzzy system can be an ap- 
propriate classification platform due to its robustness to 
incorporate imprecise knowledge and its capability of 
integrating different information as the one coming from 
expert knowledge and the information coming from 
mathematical expressions. Relevant seismogram charac- 
teristics are extracted from seismic signal based on time 
and frequency domains. Using these characteristics, a 
fuzzy classifier was built based on analyst experience. 
Classification result using real seismic data reveals that 
the classifier performance can achieve 100%. 

Keywords: Seismic discrimination, Seismic signal proc- 
essing, Feature extraction, Takagi-Sugeno fuzzy infer- 
ence. 

1. Introduction 

A challenge and essential task in seismic monitoring 
systems is to automatically discriminate between natural 
seismicity and anthropogenic events such as quarry blasts. 
This is due to the complexity of seismic signal on one 
hand, and on the other hand, the very high volume of data 
recorded continuously. Owing to the central role of such 
task, a variety of waveform based discrimination methods 
have been developed and investigated. These methods 
include spectral ratio of seismic phases or average ampli- 
tude in low and high frequency bands for a specific phase 
( Hedlin et al., 1990 [1]; Gitterman and Shapira, 1993 [2]; 
Gitterman et al [3]., 1998; Koch and Fah, 2002 
[4];Allmann et al., 2008 [5]; Dahy and Hassib, 2010 [6]), 
statistical analysis (Kushniret al., 1990 [7]; Wuster, 1993 
[8]; Kushnir et al.,1999 [9]), cross-correlation techniques 
(Harris, 1991 [10]), and Wavelet Bayesian classification 
(Gendron et al., 2000 [11]). Other approaches use Neural 
network techniques (e.g., Falsaperla et al., 1996 [12]; 



Musil and Plesinger, 1996 [13]; Muller et al., 1999 [14]; 
Dowla et al., 1990 [15]; Tiira, 1999 [16]; Jenkins and 
Sereno, 2001 [17]; Ursino et al., 2001 [18]; Del Pezzo et 
al., 2003 [19]; Scarpetta et al., 2005 [20]; Yildirim et al, 
2010 [ 21 ]). 

This study is concerned with the application of Takagi- 
Sugeno fuzzy system (Takagi and Sugeno, 1985 [22]) to 
the problem of seismic discrimination between earth- 
quake and quarry blast events. The research area in this 
work is Agadir city and its vicinity, which is situated in 
Morocco. For the purpose of identification of different 
active tectonics and analysis of seismicity of the region, 
a local seismic network was founded in 2001. The seis- 
mic network consists of five stations vertical-component 
short-period seismometer with an output proportional to 
ground velocity. The five stations are deployed around 
Agadir city and linked with Agadir’s regional seismic 
database via a radio-frequency (RF) FM modulated, and 
with national database in Rabat via terrestrial phone line 
(Figure 1). Seismic data are continuously recorded and 
transmitted in real-time to the Agaidr’s data center, 
where they are digitized and processed. Each stored 
event record should include a portion of seismic noise 
signal prior to detection time and a fixed recording time 
after detrigger time in order to assure complete recording 
of seismic events. The detection is performed by a power 
detector whereby the power over a short time-window 
(the short-term average, STA) is compared with the 
power over a long time-window (the long-term average, 
ETA) [23]. The basic idea of the algorithm is that an 
event is considered detected when the STA/FTA ratio 
exceeds a pre-determined threshold. Due to various quar- 
rying activities in the vicinity of the region, many explo- 
sions are detonated and recorded by the seismic network 
every day. Such anthropogenic events contaminate the 
recorded natural seismicity of the region and lead to 
misinterpretation of the results. Therefore, an automatic 
task to discriminate quarry blasts seismograms from 
earthquakes ones in the seismicity catalog is crucial. 
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The driving force behind this work was to develop a reli- 
able and automatic task that is able to mimic the analysts 
reasoning in discriminating between seismic events, and 
hence automatically identify earthquake events ( EQ ) 
from quarry blast ones ( QB ) which recorded by the local 
seismic network of Agadir. We addressed the problem by 
using fuzzy logic approach for several reasons, among 
which are the capability of modelling human reasoning 
and decision-making, the possibility of incorporating 
different information as the one coming from expert 
knowledge and information coming from mathematical 
expressions as well as its robustness to deal with impre- 
cise knowledge. Such properties make a fuzzy classifier 
an appropriate platform to integrate the noisy and impre- 
cise or incomplete information extracted from the seismic 
signal and the analyst experience knowledge. This com- 
bination will provide a good generalization performance. 

The remainder of the paper is structured as follows. 
The second section discusses seismogram characteristics 
and parameters used for features extraction. The third 
section describes and explains the functioning of the 
fuzzy classifier developed in this study. The fourth sec- 
tion presents results of application of the proposed fuzzy 
classifier to real seismic data. Finally, the fifth section 
reports some significant conclusions of this study. 



2. Data and feature extraction 

The proposed seismic signal classification method in this 
study operates in two steps. The first step is seismogram 
feature extraction, where a set of scalar values represent- 
ing significant seismogram features is extracted. The 
second step is seismogram classification, where the clas- 
sifier uses the previous extracted features to classify 
seismic events. It is obvious that the performance of the 
classifier is affected by the feature set used. The feature 
parameters are usually selected based on expertise and 
experience of the seismic signal analyst. In this section 
we present some seismograms of quarry blast events 
recorded by the seismic local network of Agadir, to- 
gether with earthquake seismograms for comparison 
study between these two events. Such study is useful to 
highlight some discriminant characteristics between the 
signals generated by the two types of events. Figure 2 
depicts the vertical component seismogram of two earth- 
quakes (a, b) and two quarry blasts (c, d). 

2.1 Seismogram characteristics 

The simplest classification method used to discriminate 
between earthquakes and quarry blasts is based on the 
hour of event detection [24]. This method relies on the 
fact that quarry blasts are detonated during working 
hours. Therefore, hourly, daily and weekly distribution 
of earthquakes and quarry blasts can be analysed and 
used for discrimination. Such statistical method is insuf- 
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ficient because it cannot separate between the quarry 
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Figure 2. Vertical component seismogram of two earthquakes (a, b) and two quarry blasts (c, d) recorded by the local seismic network of Agadir. 



blast and earthquake events recoded during working 
hours, but it can be used with other methods for result 
verification. 

In order to extract other discriminant parameters, the 
waveforms of these events must be investigated. Due to 
different source characteristics of each event, it is ex- 
pected that the waveform and spectrum characteristics of 
these two events will be also different. In fact, an earth- 
quake source is much more complicated phenomenon 
than explosion one. The spatial as well as the temporal 
dimension of an earthquake is larger than those of an 
explosion of comparable strength. Additionally, an ex- 
plosion is almost truly symmetric, sending out seismic 
waves of approximately the same strength in all direc- 
tions. Whereas most of earthquake sources are highly 
asymmetric, sending out different seismic signals in dif- 
ferent directions. These different sources properties in- 
troduce diverse characteristics to the seismic signal that 
may be readily used to identify each type of source and 
discriminate between explosions and tectonic earth- 
quakes. The quarry blast waveform is dominated by the 
P-wave (the first arrival), whereas the earthquake has a 
much larger S-wave and surface waves. 

In a preliminary look at figure 2, it seems that signal 
envelope is a promising discrimination parameter. The 
quarry blasts records are characterized by overlapped P 
and S waves, less impulsive onset and short duration of 
coda waves. These characteristics blend together to form 
a Gaussian envelope. 

It is evident that dislocation sources such as earthquakes 
generate more shear-wave (S-wave) energy than explo- 
sion sources. As it can be seen in the figure 2, signal 
associated with an earthquake differs appreciably from 
that of an explosion in that it involves large S waves, 
isolated or overlapped P and S waves (it depends on 



source- station distance) and exponential decay of coda 
amplitude with time. 

Analysis of many explosion signals shows that all the 
events have almost the same envelope, and can be recog- 
nized using only the envelope. Unfortunately, not only 
the explosion signals that show this feature; some earth- 
quake signals have also the same envelope as explosion. 
In order to find another parameter which can differentiate 
these types of event, we analysed their signals in the 
frequency domain. Figure 4 displays seismograms of an 
earthquake and a quarry blast with their associated FFT. 
Comparison between the two seismograms reveals that 
envelopes of these two events are quite similar, whereas 
the difference between their spectral content is clearly 
visible. It was observed that seismograms of quarry blast 
exhibit very low frequency amplitude below 1Hz. Con- 
trary to earthquake seismograms, which often show very 
high frequency amplitude in the same band. 

Another important parameter to be considered in this 
study is time duration of the event. Time duration is in- 
fluenced by many factors, mainly the characteristic di- 
mensions of the source. Thus, it should be expected that 
time duration of natural earthquakes would be longer 
than the time duration of explosions. Examination of 
many events displays that explosion records have dura- 
tions of less than 40 seconds while tectonic earthquake 
records may last for several minutes. 

2.2 Feature extraction 

The discriminant features discussed above can be 
mathematically formulated as the following: 

Envelope E: To extract the signal envelope, we use the 
Hilbert Transform HT which is capable of tracking the 
amplitude envelope of the signal (figure 3): 
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Figure 3. Seismograms of an earthquake (a) and a quarry blast (c) and their corresponding envelope (b) and (d) 
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From the given signal x(t), a complex signal A[x(t) ] (also 
known as analytical signal) that is associated with the 
original signal can be constructed as : 

A[x(t)] = x(t) + jHT[x(t)] 

The envelope of the signal is then defined as: 



A finite impulse response filter (FIR) is designed to mi- 
nimize the rapid variation of the envelope. As quarry 
blast signals usually display the same shape, we have 
chosen a quarry blast envelope to be a template envelope, 
which will be compared with the envelope of each up- 
coming event. 

Time duration T d : T d is defined as the total duration in 
seconds of the event record from the P wave onset t p to 
the end of the signal t en d- The latter is defined as the point 
where the signal is no longer seen above the noise. 



Td ~Knd t p 
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Hour H : H is the hour of event detection. Quarry blast 
events occur during the time day from 11:00 a.m. to 
02:00 p.m. and from 05:00 p.m. to 06:00 p.m. Beyond 
this time intervals, the explosion are absent, and hence 
the seismicity pattern is not affected by anthropogenic 
events. H can be expressed as the following: 

H = hour + minute/60 + second/3600 

Frequency content E s : The frequency amplitude of each 
seismic event signal is calculated in the frequency band 



[fl f2] = [0.5 1 ]Hz by the following equation: 

fi 

£5 = 1 a(f)df 
f 1 

As this parameter can be significantly altered by noise, an 
adaptive filter has been designed to subtract the noise 
FFT from the event one. The noise FFT is computed 
using the pre-event noise signal. A typical example of 
this process is illustrated in figure (5) 
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Figure 5. Exemples of denoising results for two seismic events, (a, e) the noisy seismograms, and their FFT (b, f). (c, g) FFT of the pre-event noise 
signal, (d, h) FFT of the denoised seismograms. 
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Figure 6. schematic diagram of Takagi-Sugeno fuzzy seismic signal classifier 
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3. Fuzzy classifier 

Fussy logic systems try to mimic the way humans interp- 
ret and reason with real-world knowledge in the face of 
uncertainty. It is founded on fuzzy set theory (proposed 
by Zadeh, 1965). The latter is a generalization of classic- 
al concept of set, in which membership is defined as a 
question of degree rather than in a binary manner. Thus, 
the transition from a fuzzy set to its neighbors is gradual 
rather than abrupt, resulting in continuity and robustness. 
This gradual change is expressed by a membership func- 
tion /j valued in the real unit interval [0, 1]. This theory 
provides an approximate and yet effective means for 
describing the characteristics of a system that is too com- 
plex to admit precise mathematical analysis. Generally 
speaking, fuzzy logic provides the appropriate tool to 
manipulate the real-world data, where information is 
often incomplete and does not have sharply defined 
boundaries. 

A fuzzy system implements a nonlinear mapping from its 
input space to output space. The process that describes 
the input-output relation of a real system using fuzzy 
logic is called the fuzzy inference. Fuzzy inference sys- 
tems are non-linear models that consist of three concep- 
tual components: a rule base, which contains a selection 
of fuzzy rules; a database, which defines the membership 
functions used in the fuzzy rules; and a reasoning me- 
chanism, which performs the inference procedure upon 
the rules and given facts to drive a reasonable output or 
conclusion. A typical fuzzy rule in fuzzy system has the 
following form: 

Rg IF (xj is An) AND (x 2 is A i2 ) AND AND (x n is A in ) 

THAN y t is... 

Ri indicates the ith rule (if the number of rules in the sys- 
tem is m then i=l,2,...,m); x k ; k=l,...,n is the input vector, 
and y t is the rule consequent. The terms A ik represent an- 
tecedent fuzzy sets of ith rule used to partition the input 
space into overlapping regions. Each fuzzy set A ik is de- 
scribed by its membership function ju ik , , which evaluates 
the degree to which each input variable x k belongs to the 
fuzzy set A ik through the corresponding membership val- 
ue ju ik (x k ). The membership values ju ik (x k )\ ary in the range 
[0, 1]. The structure of the rule consequents depends on 
the type of fuzzy inference system under consideration. 
In this work the Takagi-Sugeno fuzzy system is em- 
ployed. In this case, the consequents of the rules are func- 
tions of the input variables: 

Y i =f(x ] ,x 2 ,...x n ) 

The functions/- are usually first order polynomials, given 
by: 

fi(xi,x 2 ,...x n )= bo ; i+bj j iXj+ b 2>i x 2 ,..., b n> iX n 

Functioning of Takagi-Sugeno fuzzy system 
as a seismic classifier 

Let C=(EQ, QB) indicate the two classes of seismic 
events, which can be described by a set of features or 
attributes (E T d H E s ), i.e., a given event z to classify is 
an element x=(e ,t d ,h,e s ) of X t =(E x T d x Hx E s ), where 



x t is the value taken by attribute i for this event. In the 
sequel, X t will indicate either the attribute (i.e., variable) 
itself or its set of values, while x t indicate possible values 
of Xj. The problem of designing the classifier is to define 
a mapping E such as: 

F: E x T d x Hx E s — > C 

Our classifier is a zero-order Sugeno fuzzy system, in 
which/ is a constant: 

f (e ,t d ,h,e s )= b 0d 

Therefore, R t can be rewritten as the following: 

Rg IF (e is An) AND ( t d is A i2 ) AND (h is A in ) AND ( e s is 
A in ) THAN yi = b 04 

h = f C] for EQ 

Xc 2 for QB 

Fig. 6 shows a schematic diagram of the functioning of 
our Sugeno fuzzy fuzzy classifier. The classification of a 
given seismic event z(e, t d , h, e s ) involves the following 
steps: 

Fuzzifying the input 

Calculation of the degree of fulfilment jU[ of each 
rule. The degree of fulfilment of a rule evaluates 
the compatibility of a given input vector with 
the antecedent of the rule (i.e. the IF part). The 
degree of fulfilment is normally evaluated using 
a t-norm, such as the algebraic product: 

Mt (z) = Ma (e)-Mn <X, )-M* W-M.a K ) 



Calculation of the output g t of each rule: 

gi(z) = Mi(z)-y,- 



Obtaining of the final system output g as the 
weighted average of the outputs of the rules. 



g(z) = X 

i = 1 



8i(z) 



4. Results and discussion 

In this section we implement, tune and test the 
performance of the classifier. As previously described in 
section 3, the classifier is a mapping of four inputs 
described by four parameters (E T d H E s ) to an output C, 
indicating the class (EQ, QB) to which the input belongs. 
Discriminative parameters are illustrated in table 1, 
where their corresponding fundamental descriptive 
statistical information for each class is provided. Such 
information is useful in defining the membership 
functions. To represent the variable Hour, the histogram 
was used (fig. 7). 

Table 1. Discriminative parameters and their corresponding fundamen- 
tal descriptive statistical information for the two classes EQ and QB. 
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T d E E s 





EQ 


QB 


EQ 


QB 


EQ 


QB 


min 


013.69 


09.06 


0.47 


0.09 


0.08 


0.22 


max 


736.80 


32.03 


0.98 


0.55 


190.12 


7.44 


mean 


135.39 


17.30 


0.82 


0.39 


20.51 


2.38 


std 


150.45 


04.83 


0.12 


0.09 


39.51 


1.58 



The statistical data show that each variable presents over- 
lapping ranges within the two classes. Therefore using 
one variable alone cannot separate between the two 
classes. The most accepted solution for discrimination is 
the combination of all these different parameters in one 
system so that each overlapped region by one parameter 
can be separated by the others. At this point, the fuzzy 
inference system plays an important role. 

The first step in fuzzy systems is the fuzzification stage, 
which is characterized here by eleven membership func- 
tions: ‘night’, ‘morning’, ‘noon’, ‘afternoon’, ‘evening’ 
for the variable Hour H , ‘short’ and ‘long’ for the varia- 
ble duration T d , ‘Guassian’ and ‘non-Guassian’ for the 
variable envelop E , Tow’ and ‘high’ for the variable fre- 
quency content E s , and ‘earthquake’, ‘quarry blast’ for 
the variable class C. The membership functions that were 
adopted in this paper have trapezoidal forms except for 
those of the variable class which are singletons. Based on 
the analyst knowledge and the above fuzzy descriptions 
of each variable, we have implemented the seismic clas- 
sification system using a set of fuzzy rules. 

In order to evaluate the classification accuracy of the 
system, we have used a data set that consists of 120 
events, composed of 60 events for the class EQ and 60 
events for the class QB. A comparative study of each 
discriminant parameter values for the two classes is dis- 
played in figure (8). 

In order to assess the influence of integrating several 
parameters on the classifier performance, we compute the 
percentage of correct classification using the same data in 
the case of employing only two variables (E H) as input 
to the classifier, three variables (E H T d or E H E s ) and 
finally four of them. The classification results are pre- 
sented in table 2. 

Table 2. Classification results 



earthquakes 




parameters ET d H E s ET d H EHE S EH 

Results (%) 100 100 97.5 97.5 



The classification result reveals that using only envelop 
and hour parameters without duration and frequency 
content, the classier reaches a performance of 97.5%. 
This means that the combination of the input related to 
the signal envelop and the input related to the hour of 
detection provide the classifier with the most information 
needed to discriminate between the two classes. Such 
performance can never be achieved when only one para- 
meter is used. In this case, the classifier makes its deci- 
sion based on the two variables so that events that fail 
one variable may success the other. 

The addition of the parameter duration increases the per- 
formance of the classifier to 100%. From this result it 
turns out that the classifier reaches the maximum perfor- 
mance based only on the three parameters (E T d H) and 
hence the parameter frequency content (E s ) has no effect 
on the classifier performance. Nevertheless, we still keep 
such parameter because it may be very efficient in the 
situations where earthquake events with the same enve- 
lop as quarry blast events occur in the time range where 
quarry blast are exploded. 

Besides the greater flexibility that fuzzy systems offer 
due to the fuzzy instead of crisp classification thresholds, 
here is another important property of fuzzy systems 
which enables constructing an input-output mapping 
based on not only the available data set but also the ex- 
pert knowledge. This will lead to expand the generaliza- 
tion capabilities of the classifier. 

The combination of several features in one system allows 
us to exploit the information provided by each one and 
hence separate overlapped region and get an automatic 
discrimination system reaching the maximum perfor- 
mance. Furthermore, using a fuzzy logic classifier 
enables us to take advantage of its flexibility, ability to 
cope with different types of inputs and its decision mak- 
ing structure. In fact fuzzy classifiers have to make deci- 
sions based on many different variables and expert know- 
ledge; 



quarry blasts 




Figure 7. Histogram of the parameter Hour for both earthquake and quarry blast classes. 



Artificial Intelligence and Machine Learning Journal, ISSN 1687-4846, Volume 14, Issue 1, Delaware-USA, August 2014 




0 10 20 30 40 50 60 

Figure 8. Discriminant parameters values compared for 60 quarry blast 
and earthquake events, (a) envelop, (b) duration and (c) frequency 
content 

thus, offering much more capability and possibility than 
the classical statistical systems as well as data training 
based methods which suffer from significant performance 
degradation when the available training data are insuffi- 
cient. 

The results of this research have shown that the fuzzy 
approach is perfectly suitable for distinguishing earth- 
quake events from quarry blast ones. Such technique, 
achieved a high discrimination performance with low 
complexity, could be employed in online discrimination. 
Moreover, by using fuzzy logic rules, the maintenance of 
the classifier is straightforward. The features characteris- 
tics of each class might change in the future, but the un- 
derlying fuzzy classifier will be the same. For example, 



the quarry blasts exploding time can be changed in the 
future but, the system can be recalibrated quickly by 
simply shifting the fuzzy set that defines Hour or just 
rewriting the fuzzy rules without touching the complex 
programming code. Also, adding more rules to the bot- 
tom of the list to increment or expand the scope of the 
knowledge-base, as processes develop or new events are 
found, is relatively easy and without needing to undo 
what had already been done. In other words, the subse- 
quent modification was pretty easy. The last statement is 
perhaps the most important one and deserves to be ad- 
dressed here. Since fuzzy logic is built on top of linguis- 
tic terms used by ordinary people on a daily basis, fuzzy 
logic allows anyone to edit and modify the rules without 
worrying about underlying code. 

5. Conclusion 

In this paper, an automatic discrimination method be- 
tween earthquake and quarry blast events in Agadir’s 
seismic database is developed using Takagi-Sugeno 
fuzzy inference system. Each event is represented by a 
set of features deduced from the corresponding signal. 
The fuzzy system interprets the values in the input vector 
and, based on a set of fuzzy rules, assigns each input to 
its class. Fuzzy logic is used here as another tool worth 
considering when implementing nonlinear problems and 
treating uncertain and imprecise data, but otherwise fuzzy 
logic should be considered in view of its simplicity and 
transparency. This simplicity however does not limit its 
effectiveness. 

The classification results show that fuzzy classifier ap- 
pear to be a powerful tool to deal with seismic signal, 
which is distorted, weakly, noisy and complex. Further- 
more, since fuzzy logic is very useful in acquiring know- 
ledge from human experts, it is very suitable for situation 
where training data are insufficient. In addition, Fuzzy 
classes reflect reality better and allow decision makers to 
describe input attributes and output classes more intui- 
tively using linguistic variables, overlapping classes and 
approximate reasoning. Events that belong to more than 
one class are treated in all classes where they have partial 
membership. 

The achieved results validate the good performance with 
low complexity of the fuzzy classifier and demonstrate 
the appropriateness of fuzzy logic to incorporate and 
exploit the information obtained from many parameters. 
This approach shows that the classification results can 
always be improved by adding other relevant features. 
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